next up previous contents
Next: 1 Problems and Measures Up: 5 Genetic Lineages and Previous: 4 Remarks   Contents

5 Sampling of Unique Structures and Behaviours

Research into code growth and operator biases can be used to help understand the type of tree shapes sampled by genetic programming. Subtree crossover and the representation predispose solutions toward code growth and bloat [Soule and Foster, 1997,Langdon et al., 1999,Langdon, 1998b,Banzhaf and Langdon, 2002,Langdon, 2000a,Soule and Heckendorn, 2002,Luke, 2003]. While programs continue to grow, they tend to grow toward deeper and less-bushier trees. Also, the space of tree shapes visited during genetic programming search has been studied [Soule and Foster, 1997,Langdon, 2000a,Langdon, 1999,Daida, 2002], showing that there are types of shapes that are more easily sampled than others. If a problem's solution is not within the more easily sampled structures, the problem will be difficult for genetic programming [Daida et al., 2003b].

With respect to the growth of solutions, the subtree crossover operator is shown to be a more ``local'' operator, where the upper-portion of trees become fixed and variations mainly occur near the leaves [Rosca and Ballard, 1995,Igel and Chellapilla, 1999,Poli and Langdon, 1998a,D'haeseleer and Bluming, 1994]. Diversity research also demonstrates the effects of the convergence of structures [McPhee and Hopper, 1999], and in Chapter 4. However, it is also the content of trees (functions and terminals) that provide a solution to a given problem. By using stochastic sampling techniques, the proportion of solutions of increasing size was shown not to increase beyond a certain threshold [Langdon, 1999,Langdon and Poli, 2002]. That is, the space of increasingly larger solutions did not yield a higher proportion of solutions.

As the fitness function is typically a very coarse description of behaviour, it is more difficult to understand the type of solutions sampled by genetic programming. However, comparisons between genetic programming and hill-climbing methods using similar representations and operators can be helpful [Langdon and Poli, 2002,O'Reilly and Oppacher, 1994,O'Reilly and Oppacher, 1996,Juels and Wattenberg, 1995]. While genetic programming performed better and worse on various problem domains, comparisons emphasised the domains in which more hill-climbing or explorative search is beneficial. These results were particularly useful in explaining the effects of increased genetic diversity, as seen earlier in this chapter. In any case, much of the solutions' behaviour remains hidden behind the fitness function value.

To better understand the sampling of tree shapes and behaviours during search, this section examines the number of unique tree shapes and behaviours (an enriched definition of fitness) sampled. The structure aspect of the following study provides additional views of the search process, while the coarseness of fitness function values is addressed with problem-specific behaviour descriptions that reflect fitness but elucidate the behaviour of the solutions better.



Subsections
next up previous contents
Next: 1 Problems and Measures Up: 5 Genetic Lineages and Previous: 4 Remarks   Contents
S Gustafson 2004-05-20