The behaviour definition has enriched the description of the sampling of solutions by genetic programming. In the Ant problem, the geographic distribution of food pellets on the Ant trail may allow for many Ant behaviours that are able to collect 20 to 24 pellets, but unable to easily collect more. In this problem, the standard definition of fitness creates deception [Langdon and Poli, 2002] that has already been shown to negatively impair fitness improvement. However, the more gradual decrease in the ability to sample different behaviours of higher fitness explains why genetic programming often outperforms hill-climbing methods that search with poor solutions. The peaks of behaviours sampled at size 20 and 24 may represent local optima that trap search.
While the Regression problem used a behaviour that did not directly reflect its fitness value, the behaviour did reflect the complexity of solutions. The preference toward sampling behaviours of a small size and complexity could be caused by neutrality in the fitness values or the result of a biased structure sampling. In the latter case, the inability to increase structure size is hypothesized to be positively correlated with the inability to increase the complexity of solutions. This hypothesis seems plausible for the Regression domain, considering that subtree crossover typically functions near the leaves (terminals).
With respect to the sampling of unique structures of various sizes, all problems showed that while genetic programming is predisposed to code growth (and bloat), unique tree shapes of large sizes are sampled less. That is, a fewer number of big tree shapes are sampled. Remember that the number of different programs for a given tree shape increases exponentially with an increase in size. Thus, while fewer large and unique tree shapes are being sampled, the algorithm could be sampling more different programs of those shapes. The different structures of the same size that are sampled more frequently would appear to be attractive to genetic programming for one reason or another. The depth limit imposed here and the mechanics of subtree crossover are also possible causes.
Poli and McPhee (2001) examined the distributions of a variable-length linear genomes using standard crossover and a flat landscape. The authors found that the distribution of genome lengths sampled during search resembled a gamma distribution and that shorter genomes were sampled much more than longer ones. With respect to the results presented here, when the non-cumulative version of Figures 5.9 to 5.11 are examined, the distribution of unique tree shapes sampled in the Ant and Regression experiments also resembles a gamma distribution. While it is less clear for the Parity experiments, additional analysis into these distributions would provide further support of the results presented in [Poli and McPhee, 2001].
These results also suggest several possibilities to improve performance. The sizes of behaviours that were sampled in high numbers may represent areas of deception in the fitness space. This deception may be reduced by pressuring the search away from areas of neutral fitness. Previous research showing a positive correlation between high fitness-based diversity with good fitness could be the result of avoided deception leading to better fitness. Thus, when a population contains many different but equal in fitness behaviours, we may wish to move the search toward a less deceptive region of the search space. Leading the ant to follow a specific food trail reduced deception and improved fitness [Langdon and Poli, 2002]. For the Regression domain, using a component of complexity in the fitness function may also improve the performance of genetic programming. Similarly, the Parity fitness function could be amended to reflect which instances a solution solves correctly to reduce deception. Of course, exploration (global search) and exploitation (local search) ability will be affected when such methods are employed to reduce deception.