One of my favorite artificial intelligence techniques is Genetic Programming (GP). GP, along with its sibling Genetic Algorithms, are more abstractly known as Evolutionary Algorithms, which are based on biological processes.

Genetic programming isn’t talked much as other artificial intelligence techniques for a few reasons, and I’ll give my thoughts on that later, but it is very useful at tackling search and optimization problems from wide angles. The algorithm will come up with solutions that you may never have thought of before, and that’s what’s most exciting about it.

## How Does It Work?

Before I give an example of what Genetic Programming can do, let’s first cover some basics on how it works.

- Create an initial population of individuals.
- Evaluate the fitness of each individual with a numerical value (whether higher is better or worse depends on the problem being solved).
- “Evolve” the next generation of individuals by selecting individuals of the current population by fitness (there are many types of of fitness based selection methods), and apply the genetic operations on them.
- The resulting individual(s) are then put into the next generation.
- When the next generation reaches the desired population size, it replaces the current population.
- Repeat 2. to 5. until an individual with the desired fitness value is found, or some other exiting criteria (max generations, elapsed time, fitness convergence in the population, etc.)

As you can see with step 6. this will run in a loop, searching for the best solution. You could parallelize this algorithm very easily either by splitting up the fitness evaluations or splitting up the population into demes (islands).

Some definitions from the steps above:

*Population*: The set of individuals the algorithm processes.

*Individual*: A single solution to the problem, and is typically in the form of a tree data structure.

*Generation*: A single iteration of the algorithm. This involves: {Fitness evaluation -> Selection -> Breeding}.

*Fitness*: Like in nature, individuals are more or less “fit” based on their environment. In Genetic Programming, the problem and it’s fitness function determine how fit an individual is which is usually some quantitative value used for comparision purposes.

## The Original Data Structure: Trees

In basic Genetic Programming, individuals are expressed as Parse Trees. This allows the individual to really evolve each generation without being constrained by a fixed shape or size, like an Array a la Genetic Algorithms. The individual’s tree will change to meet the fitness pressure of the algorithm.

More advance techniques will add more functionality to the tree, such as ADFs (Automatically Defined Functions), or with architecture altering routines, the idea of numerical resulting branches will be added. More trees and data structures will be added to the core tree, (also known as the result producing branch).

## What Type of Problems Is GP Good For?

There are many types of problems that Genetic Programming can be applied to, but it really shines with search and optimization problems. Problems that are not well understood or no good solution is known to exist are good examples of where GP has been applied and succeeded. Because GP based on randomness, the algorithm will find solutions that a human may never have thought of.

Here are some examples of problems that Genetic Programming can be used for:

- Image processing
- Financial modeling
- Time series and regression analysis
- Industrial Process Control
- Computational Chemistry and drug modeling

## Human Competitive Results

John Koza listed in his books on Genetic Programming several examples of human competitive results produced by Genetic Programming. I’ll reframe from listing them all but a few examples are:

- Creation of a sorting network for seven items using only 16 steps.
- Creation of a competitive soccer-playing program for the RoboCup 1997 competition.
- An evolved antenna that was deployed on NASA’s Space Technology 5 mission.

## Real Examples

I have written my own Genetic Programming library and applied it to a few problems such as: n-bit multiplexer, n-bit even/odd parity, linear regression, stock trading strategies. I will likely post future articles that go deeper on some of these problems, but for now, I’ll show the n-bit parity problem.

A Boolean parity function is a function that returns 1 if the input vector has an odd number of 1’s. This makes it more specifically an “odd” parity function. If it returned a 1 for an even number 1’s in the input vector, it would be an “even” parity function. See Wikipedia for more details: https://en.wikipedia.org/wiki/Parity_function

I chose 6 bits for the input vector, and “odd” as the type of parity, so I’ll evolve a 6-bit odd parity function. The type of logic gates I’ll use are: AND, OR, NOT, NAND, XOR. I threw in NAND and XOR because they’re not really needed when you have the first three, but to show that Genetic Programming can evolve a solution with the mix, I’ve added them anyways.

Here’s a screen clipping of my console output.

In the output above, you’ll see two different solutions shown. The top-shown solution appears after the text “New Best Individual. Gen(35)”. The program found a new best solution based on fitness at then current generation 35. The D0..D5 are the 6 input bits.

After the individual’s expression tree is printed, it shows “Fitness:(Hits:Trials) : (63:64), Adjusted Fitness: 0.5, Standard Fitness: 1”. What this means is, there are 64 total fitness cases for the 6-bit parity problem and this individual passed 63 of them. Standard fitness is often the error, so 64-63 = 1. Adjusted Fitness is a simple formula based on standard fitness:

where f(x) is the Adjusted Fitness, and g(x) is the Standardized Fitness, so 1 / (1 + 1) = 0.5.

The output goes on to say “Ideal individual found”. This is what I like to see: a 100% fit solution. The solution passed all 64 fitness cases, and based on the expression tree, it used all five of the logic gates I allowed it to use.

I recall from my college days that you could build any Boolean logic circuit out of only NAND gates. Let’s see if GP can find a solution:

I let the program run to 500 generations, but it could only get an approximate solution (61 out of 64 test cases). I’m sure if I let it run longer it would have found a 100% correct solution, but as you can see from the clip above, using only NAND gates makes the solution much larger and more difficult to find.

Let me know what types of problems you would like to see tackled with Genetic Programming in the comments below.

## References

http://www.genetic-programming.com/GPEM2010article.pdf

http://dces.essex.ac.uk/staff/rpoli/gp-field-guide/A_Field_Guide_to_Genetic_Programming.pdf