The Genetic Algorithm is the most widely known Evolutionary Algorithm and can be applied to a wide range of problems. After explaining the basic principles, I will show how to apply the Genetic Algorithm to the so-called 0-1-KNAPSACK problem and come up with an implementation of a suggested configuration [1] for the algorithm in Ruby. Finally there will be a short investigation of the behaviour and performance of the algorithm.** **

## 0-1-KNAPSACK

In order to explain what the problem is about, I would like to quote the corresponding article from Wikipedia:

The

knapsack problemorrucksack problemis a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.

There are several formulations of the problem but we will deal just with the most common one, the 0-1-KNAPSACK. This means there are no copies of items allowed. Hence, an item can be just included zero or one times.

I have chosen this problem as an example because its optimal solution is easy to compute. Thus, we can reasonably evaluate the results of the Genetic Algorithm.

The following instance of the problem was taken from rosettacode.org. It is about a guy who wants to go to the mountains. Due to the fact that he can carry only a maximum amount of weight he wonders what to take along . He makes a list of items containing their weight and a value describing how useful each item will be. Notice that I have added two more items to the list and changed the max. weight to 500.

item | weight (dag) | value |
---|---|---|

map | 9 | 150 |

compass | 13 | 35 |

water | 153 | 200 |

sandwich | 50 | 160 |

glucose | 15 | 60 |

tin | 68 | 45 |

banana | 27 | 60 |

apple | 39 | 40 |

cheese | 23 | 30 |

beer | 52 | 10 |

suntan cream | 11 | 70 |

camera | 32 | 30 |

T-shirt | 24 | 15 |

trousers | 48 | 10 |

umbrella | 73 | 40 |

waterproof trousers | 42 | 70 |

waterproof overclothes | 43 | 75 |

note-case | 22 | 80 |

sunglasses | 7 | 20 |

towel | 18 | 12 |

socks | 4 | 50 |

book | 30 | 10 |

notebook | 90 | 1 |

tent | 200 | 150 |

knapsack | ?500 dag | ? |

The naive solution one would come up with would be to try every possible combination of items. This could be done by computing the power set of the list and checking each element afterwards. The suggested brute force solution works fine (but not fast) for the 22 original items. My adjusted list contains 24 items, which leads to a power set of size . Brute force eats up loads of memory (a few gigs) now and takes ages to finish. So this is obviously a bad idea.

While looking for a better solution I came across a dynamic programming solution in python. I’ve translated it to ruby and added it to rosettacode.org. By the way, you can find all the code mentioned in this post bundled up in this gist.

This is the result of the dynamic programming solution:

Found solution: apple, banana, camera, cheese, compass, glucose, map, note-case, sandwich, socks, sunglasses, suntan cream, water, waterproof overclothes, waterproof trousers

`With value: 1130`

` `

`Time for dynamic programming solution:`

0.020000 0.000000 0.020000 ( 0.014574)

That’s pretty fast, isn’t it? Let’s have a look at what the Genetic Algorithm can do.

## Basic Idea of the Genetic Algorithm

Like his brothers and sisters within the family of the Evolutionary Algorithms, the Genetic Algorithm is a population-based metaheuristic optimization algorithm, inspired by natural biological evolution. The following flowchart gives an overview of the steps the algorithm performs.

The population consists of individuals which are representing possible solutions for the problem to be solved and it is usually initialized randomly. A so-called fitness function describes the quality of the individuals and often corresponds to the objective function to be optimized. After the fitness has been evaluated a parent selection takes place. These parents are allowed to reproduce and generate an offspring, e.g. by doing a simple crossover. After this the offspring is mutated. Last but not least one has to decide which individuals will survive and make it into the next generation. Now the cycle starts over again with the new generation and proceeds until the termination condition is met. There are plenty of mechanisms and techniques to choose from which may or may not be suitable for the problem at hand. The chosen mechanisms to solve the 0-1-KNAPSACK problem will be covered in detail. For further reading and more in-depth explanations of other possible techniques I refer to [1].

## Genetic Algorithm to solve 0-1-KNAPSACK

The suggested solution by Eiben and Smith [1] can be summed up as follows (n is the number of items):

Representation |
Binary strings of length n |

Recombination |
One point crossover |

Recombination probability |
70% |

Mutation |
Flip each bit according to mutation probability |

Mutation probability |
1/n |

Parent selection |
Tournament selection with 2 combatants |

Survivor selection |
Generational |

Population size |
500 |

Termination condition |
No improvement in the last 25 generations |

### Representation

A valid representation has to be complete and correct. Complete means that it is able to represent every possible solution to the problem and correct means that every solution satisfies the constraints of the problem. In our case this would be the maximum weight.

The encoding of a solution forms the *genotype* of a solution whereas the possible solution within the context of the problem forms the *phenotype*.

Given `n`

items we could represent a solution with a binary string of length n where bit `i`

determines whether item `i`

is included or not. Hence, our *genotype space *would be the set of all binary strings of length `n`

. This representation is obviously complete but not correct. It is still possible to come up with combinations of items exceeding the maximum weight. We cannot map 1-to-1 from *genotype space *to *phenotype space*, because not all binary strings of length n lead to valid *phenotypes*. We map the *genotype *to the *phenotype* by taking the first `k`

bits with value 1 into account which do not exceed the maximum weight. The rest of the bit string is ignored.

### Parent selection

In order to chose a parent for recombination we use *deterministic tournament selection*. We pick two individuals randomly from the population. Out of these two the better one will be a parent to generate offspring. This strategy is easy to implement but still powerful. In general it prefers the individuals with higher fitness, but still preserves premature convergence. The randomly picked individuals could both have low fitness. However, they are possibly still valuable for exploring the search space into the right direction. In order to increase the preference for higher fitness one could just throw more than two individuals into the tournament.

### Recombination

We are going to use a simple one point crossover strategy as described here. After picking two parents the chance for recombination taking place is 70%. Thus, there is a 30% chance for just cloning them. The reason for this is simple. If we found a good solution there may be an even better one near by, so we don’t want to move away from it too early. Nevertheless, we still would like to explore a wider area of the search space, so recombination is the preferred way to go.

### Mutation

After recombination has taken place the offspring is mutated by flipping each of its bits with a given probability. There should be one mutation per offspring on average. Therefore we set the probability to where is the number of bits.

### Survivor selection

A *generational survival* strategy is applied here. Given a population size of 500 this means that we generate 500 children who will be the next generation. All parents are discarded. We do not keep any parents here, as we already have kept some of them due to our recombination strategy.

### Termination condition

The algorithm terminates if there was no improvement in the last 25 generations.

## How well does it work?

The implementation can be found in this gist.

Found solution: apple, banana, camera, cheese, compass, glucose, map, note-case, sandwich, socks, sunglasses, suntan cream, water, waterproof overclothes, waterproof trousers

` `

`with value 1130`

and 36 generations

` `

`Time for Genetic Algorithm:`

6.780000 0.010000 6.790000 ( 6.793232)

Awesome. It took a little longer than the dynamic programming solution (actually ~465 times longer). Nonetheless, it was able to find the optimal solution. Let’s have a look at how the algorithm behaves over time:

Notice how the population improves over time until best, worst and average converge to the optimum. Looks like we can gain some speed by tuning the termination condition to terminate if the algorithm already converged. This modification is left to the reader as an exercise.

Tuning the termination condition may yield some speed improvements. But what else could possibly enhance execution speed? The first question coming to my mind was: *Do we really need 500 individuals per population? *I set up the following investigation to figure this out. For each population size in 1…500 I executed the algorithm 100 times and tracked the best solution, the average fitness and the number of optimal solutions discovered during these 100 runs. This experiment took about 29 hours by the way. Here are the results (click on the picture to see a larger version):

There are some interesting effects this experiment reveals. The first graph indicates that for populations smaller than 50 the chance for premature convergence seems to be rather high. Those populations were not able to find the optimum because they probably got stuck in a local optimum. The second graph indicates that the algorithm can find quite good but not perfect solutions with a decent population size > 75, depending on how you define “good”. Last but not least graph no. 3 shows that the probability to find the optimum increases with the number of individuals in the population. A possible explanation could be, that more individuals are able to explore a larger area of the search space.

The solution of an exemplary run with population size 100:

Found solution: apple, banana, camera, compass, map, note-case, sandwich, socks, sunglasses, suntan cream, t-shirt, towel, water, waterproof overclothes, waterproof trousers

`with value 1067`

and with 42 generations

` `

`Time for Genetic Algorithm:`

1.310000 0.000000 1.310000 ( 1.313500)

The result is good but not perfect, just as expected.

## Conclusion

Genetic Algorithms require problem specific parameter tuning to yield good results. But once you figured out a good setup for your problem class, it seems to work well. Obviously the Genetic Algorithm performs better than the power set computation in the example above. However, the dynamic programming solution still outperforms the Genetic Algorithm. If there is an exact solution available you should of course use it. If there is no exact way to solve the problem at hand, then the Genetic Algorithm may be a good shot. A good solution is often enough and it is always better than no solution at all.

All of the code is in this gist. Put the files into one folder, create a subfolder named “logs” and have fun. The code works on MRI 1.9.2, I did not test it with the other rubies. I also included the matlab scripts to plot the logs and the log of the population size investigation. Have fun!

As always, comments are welcome!

## References

[1] A.E. Eiben, J.E. Smith. *Introduction to Evolutionary Computing. *Springer-Verlag, 2010. First Edition: Springer-Verlag, 2003