next up previous contents
Next: 2 Controlling Diversity Up: 1 Diversity Measures and Previous: 1 Diversity Measures and   Contents

1 Measuring Diversity

Genetic programming typically employs a syntax-tree representation, often using binary trees. A population of trees exists in each generation. Function and terminal sets consist of anywhere from 1 to several terminals and usually 3 or more functions. The number of different binary tree shapes with $n$ internal nodes can be found using the Catalan number $C_n=(2n)!/(n!(n+1)!)$, [Lucas et al., 1993]. The beginning of this series, where each term describes the number of binary tree configurations with $n$ internal nodes, i.e. trees of size $2n+1$, consists of the following terms:

\begin{displaymath}
1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, \ldots
\end{displaymath}

Now, for every tree of size $2n+1$, the total number of unique trees given the function and terminals sets creates an enormous space including $C_n \times f^n \times t^{(n+1)}$ programs, where $f$ is the number of functions and $t$ is the number of terminals. Within this space there may be many equivalent trees, functionally and based on the fitness function. What aspect of tree shape, content or behaviour should diversity measures capture? What are the critical elements? Should measures concentrate on the node level, i.e. based on the number and types of function and terminal nodes [McPhee and Hopper, 1999], or should more problem specific measures be used that describe detailed solution behaviour [D'haeseleer and Bluming, 1994]?

The term diversity is often used without definition. The implicit assumption is that it refers to the solution representation. However, it is useful to also consider aspects of solution behaviour or fitness. In the following research, the term genotype refers to the shape and content of the solution, while phenotype refers to the fitness value. The term phenotype generally eludes to a more descriptive measure of solution behaviour. However, the standard practice in the literature uses fitness value and phenotype interchangeably. Later, a more complex definition of solution behaviour will be used that describes solutions in more detail than typical fitness functions.

Based on the type of diversity being measured, several measures of diversity can be defined. The number of unique genotypes was the traditional diversity measure [Koza, 1992,Langdon, 1998a]. The number of unique fitness values was also considered a diversity measure [Rosca, 1995b]. There are obviously many other possible definitions. Some of these definitions are described in Chapter 4.


next up previous contents
Next: 2 Controlling Diversity Up: 1 Diversity Measures and Previous: 1 Diversity Measures and   Contents
S Gustafson 2004-05-20