The amazing Genetic Algorithms!Posted: 31 January, 2015
Why do we say Data Science? is this a part of the Science? if you think as an Statistics or Engineer it’s difficult to understand.
One of the most beautiful things in the earth is the nature. We don’t think about that but we are rounded about nature inspired objects, for example planes (like birds), buildings (like hives) or submarines (like whales). When we talk about the Computer World we have also looked at the nature world to learn how to find the best solutions to the most difficult problems.
In the Data Science Universe, more concretely in the algorithm side, we have interesting nature oriented solutions from fields like Neural Networks or Genetics or Swarm Intelligence.
Is really interesting thinking in how to find algorithms that emulate Neural Networks to solve daily problems we have, or look at the bees or ants to use Swarm Intelligence and replicate this behaviour to apply solutions in Healthcare or Public Administration to improve the quality of live of the people. This is my main objective try to give VALUE to the society and through the Data Science Universe I believe that it’s a reality!
Here you find examples of biological systems that have inspired computational algorithms.
Also in this blog you find more detail with examples around Algorithms in Nature.Turnig back to this post I focus in the Genetics Algorithm, in other posts I will talk about other Nature Algorithms.
Computer science and biology have enjoyed a long and fruitful relationship for decades. Biologists rely on computational methods to analyze and integrate large data sets, while several computational methods were inspired by the high-level design principles of biological systems.
Biologists have been increasingly relying on sophisticated computational methods, especially over the last two decades as molecular data have rapidly accumulated. Computational tools for searching large databases, including BLAST (Altschul et al, 1990), are now routinely used by experimentalists. Genome sequencing and assembly rely heavily on algorithms to speed up data accumulation and analysis (Gusfield, 1997; Trapnell and Salzberg, 2009; Schatz et al, 2010).
Computational methods have also been developed for integrating various types of functional genomics data and using them to create models of regulatory networks and other interactions in the cell (Alon, 2006; Huttenhower et al, 2009; Myers et al, 2009). Indeed, several computational biology departments have been established over the last few years that are focused on developing additional computational methods to aid in solving life science’s greatest mysteries.
After a short abstract I will continue with a little introduction about Genetics Algorithms: they are a part of evolutionary computing, which is a rapidly growing area of artificial intelligence. As you can guess, genetic algorithms are inspired by Darwin’s theory about evolution. Simply said, solution to a problem solved by genetic algorithms is evolved.
Reviewing a little bit of history, the idea of evolutionary computing was introduced in the 1960s by I. Rechenberg in his work “Evolution strategies”. His idea was then developed by other researchers. Genetic Algorithms were invented by John Holland and developed by him and his students and colleagues. This lead to Holland’s book “Adaption in Natural and Artificial Systems” published in 1975.
“Computer programs that “evolve” in ways that resemble natural selection can solve complex problems even their creators do not fully understand” by John H. Holland. In 1992 John Koza has used genetic algorithm to evolve programs to perform certain tasks. He called his method “genetic programming” (GP). LISP programs were used, because programs in this language can expressed in the form of a “parse tree”, which is the object the GA works on.
Here you will find what is Genetic Programming, how it works, sources of information about the field of GP, GA and the field Genetic and Evolutionary Computation (GEC), John Koza’s publications, and other interesting information. Here you will find his biography.
Now we will review the biological background learning what is a Chromosome, how this chromosome are mixed in the conception moment (reproduction), the use of this GA to solve NP-hard-Problems and the basics steps of GA.
All living organisms consist of cells. In each cell there is the same set of chromosomes. Chromosomes are strings of DNA and serves as a model for the whole organism. A chromosome consist of genes, blocks of DNA. Each gene encodes a particular protein. Basically can be said, that each gene encodes a trait, for example color of eyes. Possible settings for a trait (e.g. blue, brown) are called alleles. Each gene has its own position in the chromosome. This position is called locus.
Complete set of genetic material (all chromosomes) is called genome. Particular set of genes in genome is called genotype. The genotype is with later development after birth base for the organism’s phenotype, its physical and mental characteristics, such as eye color, intelligence etc.
During reproduction, first occurs recombination (or crossover). Genes from parents form in some way the whole new chromosome. The new created offspring can then be mutated. Mutation means, that the elements of DNA are a bit changed. This changes are mainly caused by errors in copying genes from parents.The fitness of an organism is measured by success of the organism in its life.
Example of difficult problems, which cannot be solved int “traditional” way, are NP problems.There are many tasks for which we know fast (polynomial) algorithms. There are also some problems that are not possible to be solved algorithmicaly. For some problems was proved that they are not solvable in polynomial time.But there are many important tasks, for which it is very difficult to find a solution, but once we have it, it is easy to check the solution.
This fact led to NP-complete problems. NP stands for nondeterministic polynomial and it means that it is possible to “guess” the solution (by some nondeterministic algorithm) and then check it, both in polynomial time. If we had a machine that can guess, we would be able to find a solution in some reasonable time.
Studying of NP-complete problems is for simplicity restricted to the problems, where the answer can be yes or no. Because there are tasks with complicated outputs, a class of problems called NP-hard problems has been introduced. This class is not as limited as class of NP-complete problems.
For NP-problems is characteristic that some simple algorithm to find a solution is obvious at a first sight – just trying all possible solutions. But this algorithm is very slow (usually O(2^n)) and even for a bit bigger instances of the problems it is not usable at all. Today nobody knows if some faster exact algorithm exists. Proving or disproving this remains as a big task for new researchers. Today many people think, that such an algorithm does not exist and so they are looking for some alternative methods – example of these methods are genetic algorithms.
Examples of the NP problems are satisfiability problem, travelling salesman problem or knapsack problem. Compendium of NP problems is available.
Other GA Applications:
– Control: Gas pipeline, missile evasion
– Design: Aircraft design, keyboard configuration, communication networks
– Game playing: Poker, checkers
– Security: Encryption and Decryption
– Robotics:Trajectory planning
The basics of GA:
[Start] Generate random population of n chromosomes (suitable solutions for the problem)
[Fitness] Evaluate the fitness f(x) of each chromosome x in the population
[New population] Create a new population by repeating following steps until the new population is complete
[Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
[Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.
[Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
[Accepting] Place new offspring in a new population
[Replace] Use new generated population for a further run of algorithm
[Test] If the end condition is satisfied, stop, and return the best solution in current population
[Loop] Go to step 2
Example: the MAXONE problem: Suppose we want to maximize the number of ones in a string of l binary digits
Is it a trivial problem? It may seem so because we know the answer in advance However, we can think of it as maximizing the number of correct answers, each encoded by 1, to l yes/no difficult questions`
An individual is encoded (naturally) as a string of l binary digits
– The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code
– We start with a population of n random strings. Suppose that l = 10 and n = 6
We toss a fair coin 60 times and get the following initial population:
s1 = 1111010101 f (s1) = 7
s2 = 0111000101 f (s2) = 5
s3 = 1110110101 f (s3) = 7
s4 = 0100010011 f (s4) = 4
s5 = 1110111101 f (s5) = 8
s6 = 0100110000 f (s6) = 3
Step 1: Selection
We randomly (using a biased coin) select a subset of the individuals based on their fitness
Selected set Suppose that, after performing selection, we get the following population:
s1` = 1111010101 (s1)
s2` = 1110110101 (s3)
s3` = 1110111101 (s5)
s4` = 0111000101 (s2)
s5` = 0100010011 (s4)
s6` = 1110111101 (s5)
Step 2: crossover
– Next we mate strings for crossover. For each couple we first decide (using some pre-defined probability, for instance 0.6) whether to actually perform the crossover or not
– If we decide to actually perform crossover, we randomly extract the crossover points, for instance 2 and 5
s1` = 1111010101
s2` = 1110110101
s1“ = 1110110101
s2“ = 1111010101
Step 3: mutations
The final step is to apply random mutations: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0.1)
s1“ = 1110110101
s2“ = 1111010101
s3“ = 1110111101
s4“ = 0111000101
s5“ = 0100011101
s6“ = 1110110011
s1“` = 1110100101
s2“` = 1111110100
s3“` = 1110101111
s4“` = 0111000101
s5“` = 0100011101
s6“` = 1110110001
And now, iterate … In one generation, the total population fitness changed from 34 to 37, thus improved by ~9% At this point, we go through the same process all over again, until a stopping criterion is met.
Benefits of Genetic Algorithms
– Concept is easy to understand
– Modular, separate from application
– Supports multi-objective optimization
– Always an answer; answer gets better with time.
– Easy to exploit previous or alternate solutions
– Flexible building blocks for hybrid applications.
Here you can read a List of genetics algorithm applications.
If you want to see detailed and coded examples:
– Again here you will find a very simple example with R: “You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of “survival points”. You’re objective is to maximize the number of survival points.”
– Introduction to the gstudio package: “The gstudio package is a package created to make the inclusion of marker based population genetic data in the R workflow easy. An underlying motivation for this package is to provide a link between spatial analysis and graphing packages such that the user can be quickly and easily manipulate data in exploratory ways that aid in gaining biological inferences.”
– Genetic Algorithm: “Solving the Travelling Salesman Problem with a Genetic Algorithm in CoffeeScript. Since the Travelling Salesman problem is a NP-hard problem, it’s a good example on how to use a GA.In this case there are 15 cities and their distances are hard-coded in an array in the code.You can change the population’s size and the number of generations in the form to the right. After the maximum number of generations is reached, you will see a performance chart of the current run.”
– Iris Recognition with GA: “They have developed an iris recognition method based on genetic algorithms for the optimal features extraction. With the cost of eye-scanning technology coming down and the need for more secure systems going up, it’s time to take a close look at iris recognition for security applications.” This is a product that shows another interesting application of GA.
– Here you will find a Python GA example: “The problem solved is: For any given ‘target’ number, find an expression involving any legal combination of addition(+), subtraction(-), multiplication(*) and division(/) on digits that represents the given target number.e.g., for the target 15, “9 + 3 * 2″ is a solution.”
– Is remarcable people like Manu Sporny’s that has published all of him known genetic data as open source and released all him rights to the data.
– Finally I recommend you read this chapter of “Nature of the code” by Daniel Shiffman, in this chapter “The Evolution of Code” you will find a lot of Genetics explanations and coded examples like the rocket idea. In 2009, Jer Thorp released a genetic algorithms example on his blog entitled “Smart Rockets.” Jer points out that NASA uses evolutionary computing techniques to solve all sorts of problems, from satellite antenna design to rocket firing patterns. This inspired him to create a Flash demonstration of evolving rockets.
I will try to add more examples here because Genetics Algorithm are really important, if you do not agree, ask Charles Darwin!