In [Wineberg and Oppacher, 2003] an
inter-population diversity method
is developed based on pair-wise distance by counting the frequencies
of symbols for each position in the genome. While a similar
method
could be found for genetic programming syntax trees,
the variable length and size of symbol sets would make this calculation
more complex.
To reduce computation time here, an
approximate population
diversity measure is found by only comparing each population member against a
single tree.
Every individual in the population is compared with the
best fit individual found so far in the run. This measure is
then divided by the population size.
Both edit distance One and Two are vulnerable to outliers, especially when the best fit individual is the outlier. However, previous experimental results display two key properties which make these measures appropriate and representative. First, even if the best fitness is found in the initial population, an individual in the current generation is considered to be the best of the run if it is at least as good as the current best of the run individual. Secondly, with probabilistic selection based on fitness, the best individual is likely to contribute several offspring to the next generation and is unlikely to remain the outlier for long. Later chapters also consider the best individual in the current generation for these measures. Another reason for using the best of run individual in this chapter is that it is common for researchers to consider this individual during analysis rather than the best in the current generation.