Matt Neuburg is the author of REALbasic: The Definitive Guide, and a member of the editorial board of REALbasic Developer. This article was originally published in REALbasic Developer Issue 2.1 (Aug/Sept 2003).
Last month’s column introduced genetic algorithms, where we simulate Mendelian / Darwinian evolution as a way of solving some problem. We start with a population of individuals, each endowed with some DNA. We regard each individual’s DNA as a possible answer to our problem, and we rank it according to how good an answer it is. We take the individuals with the best DNA and mate them, generating a new population, repeating the process again and again.
The “problem” in this case, you may recall, was to arrive at a good sequence of bytes, where “goodness” is defined as follows. Every byte is translated into an alphanumeric character or a space, thus arriving at a series of words. A word like “9BJ6” is bad, and reduces the cumulative value of the DNA by multiplying it by 0.8. A word comprising entirely digits, like “46”, adds to the cumulative value of the DNA a number of points equal to its length. The word “M” doubles the cumulative value of the DNA. And a word like “AG12589” — the letter “A” followed by only non-digits followed by only digits — multiplies the cumulative value of the DNA by the numeric value of its digits part. Although this is not an earth-shattering problem, it makes for a good example, since it turns out that in 100 generations the computer is able to optimize its byte sequence very strongly with respect to these rules.
As I said last time, the heart of the algorithm is a loop that looks a lot like one’s idea of real-life evolution:
Core loop:
do evaluateFitness letTotallyUnfitDie selectParents breed mutate generation = generation + 1 loop until generation >= generation_max
Recall that we have a class called Individual; that an Individual has a DNA property, which is a memoryblock of size 64, and a Fitness (double) property, to hold the results of evaluating its DNA; and that there is a Population, which is an array of 1024 Individuals. EvaluateFitness is a simple process of expressing in REALbasic the rules we’ve already described verbally. LetTotallyUnfitDie eliminates all Individuals whose Fitness is below 0.1. SelectParents was described last time: based on each Individual’s relative Fitness, he gets a certain number of “tickets” each of which allots him the right to have sex once. These tickets appear as the value of each Individual’s Parent property (an integer). We are now ready to breed our current Population of Individuals and obtain the Population of the next generation.
My approach to breeding is extremely simple. I mate two Individuals chosen at random, and append the resulting Individual to an array called Children. I also decrement the Parent property of each of the two parent Individuals, and if it is zero, I remove that Individual from the Population. Thus, I’m eventually left with an empty Population array and a full Children array; I then move all the Children into the Population array. Thus, my approach to the notion of generations involves complete simultaneous replacement: a generation breeds, Individuals die the moment they become sterile, and the entire next generation moves in without ever meeting their parents. As explained, however, in the superb online article by Hartmut Pohlheim (see References), this step, technically known as reinsertion, can be performed in other ways. What I’m doing is “pure reinsertion”, which has the disadvantage that parents can be replaced by worse children; a better approach is to use some cut-off point to replace just the least fit parents by the most fit children, thus giving every Individual a chance to live and breed for as long as he continues to excel amongst the Population. Be that as it may, here’s my Breed routine:
Breed:
Sub breed() dim children(popsize) as individual redim children(popsize-1) dim i,u as integer dim parent1, parent2 as individual dim parent1index, parent2index as integer dim origPop as integer origPop = ubound(population) + 1 for i = ubound(population) downto 0 if population(i).sterile then population.remove i end next u = popsize-1 for i = 0 to u parent1index = ubound(population) parent2index = app.r.lessThan(parent1index) parent1 = population(parent1index) parent2 = population(parent2index) parent1.decreaseParent if parent1.sterile then population.remove parent1index end parent2.decreaseParent if parent2.sterile then population.remove parent2index end children(i) = parent1.breedWith(parent2) next redim population(u) for i = 0 to u population(i) = children(i) next End Sub
(The property app.r is an instance of REALbasic 5’s Random class; because of the very annoying way this is implemented, you have to maintain an instance of it if you want a coherent random sequence over your program’s lifetime.)
The Individual method BreedWith simply calls an Individual constructor that makes a new Individual (the child) given two other Individuals (the parents). This, of course, is all-important. How should the DNA of two Individuals be combined to make the DNA of a new Individual? My original approach was to choose at random, for each byte of the child’s DNA, the value from one of the parents. Upon reading Polheim, I learned that this is called “uniform crossover”, and I decided I didn’t like it; eventually I opted for “multi-point crossover”, where the DNA is divided randomly into some predetermined number of segments (I chose 10), and the random choice between the parents is performed for each segment.
Individual.Individual:
Sub individual(parent1 as individual, parent2 as individual) self.dna = newMemoryBlock(dna_byte_length) self.survived = true self.parent = 0 self.fitness = 0.0 dim segs(crossover_segments) as integer dim i as integer for i = 1 to crossover_segments segs(i) = app.r.lessThan(dna_byte_length) next segs.remove(0) segs.sort segs.append dna_byte_length dim pickAParent as double dim byte as integer for each i in segs pickAParent = app.r.number while byte < i if pickAParent < 0.5 then dna.byte(byte) = parent1.dna.byte(byte) else dna.byte(byte) = parent2.dna.byte(byte) end byte = byte + 1 wend next End Sub
The final step in the core loop is Mutate. This calls a Mutate method for every Individual of the Population; some small percentage of Individuals respond by altering their DNA in some way, thus contributing to the overall creativity of the Population.
Individual.Mutate:
Function mutate() As boolean dim shouldMutate as boolean shouldMutate = app.r.number < mut_rate if not shouldMutate then return false end dim byte as integer dim i,u as integer u = dna_byte_length - 1 for i = 0 to u byte = dna.byte(i) byte = bitwise.bitand(byte, app.r.lessThan(256)) byte = bitwise.bitor(byte, app.r.lessThan(256)) dna.byte(i) = byte next return true End Function
The entire algorithm that I ultimately arrived at is packaged as a REALbasic 5.1 project that shows each generation’s best DNA and graphs that DNA’s fitness value, along with several other interesting statistics, as the generations progress. Feel free to download it, experiment with it, and improve upon it. It would be interesting to implement various other selection, reinsertion, replacement and mutation algorithms, as described in Pohlheim’s article. I also would have liked to make the fitness function editable by the user, but bugs in RBScript prevented that idea from bearing fruit for the present.