There are many elements of genetic programming which limit the scalability of the algorithm. First, evaluating a solution can be computationally expensive. While evaluating a single solution alone is not a problem, a very large population of solutions easily becomes a problem. A bias toward the increase in solution size compounds this problem by increasing the evaluation time and the memory used by the population. Also, it is not clear that genetic programming can be applied effectively to when the combinatorial search space is increased, nor is it clear that very simple implementation decisions, such as the population size, do not drastically change the difficulty of a problem instance.
A second major issue in genetic programming is the basic choice of representation, operators and fitness function. The elements define the fitness landscape. The representation of solutions by computer programs is straightforward for modelling a variety of problem domains. However, transforming one program to another, while maintaining some commonality of behaviour between the two is not, i.e. the property of strong causality. Thus, when the representation, operators and fitness function have the property of weak causality, the fitness landscape can be described as rough. Not only has the operator in genetic programming come under scrutiny, but several representation variations have been proposed to improve the algorithm, e.g. grammar-based genetic programming [Whigham, 1995] and linear genetic programming [Keller and Banzhaf, 1996].
Related to the issue of strong and weak causality in genetic programming is the issue of context and content in the representation and fitness function. The content of a syntax tree refers to its functions and terminals. The context of a group of functions and terminals in a particular syntax tree is the way in which these primitives contribute toward fitness, and is typically determined by the other elements in the tree. However, moving this content, or group of functions and terminals, to another syntax tree may change its context. Thus, selecting a portion of one solution that has good fitness and placing it in another solution with good fitness is not guaranteed to preserve the original context of the content. The issue of content and context in the genetic programming system is likely to effect the causality of the system.
Both of the previous two issues, scalability and fitness landscape, have strong links with the population and its dynamics. For instance, if the size of the population could be reduced without cost to performance, computation could be drastically reduced. As the population has a strong effect on guiding the search in the solution space, the issues of scalability and landscape roughness could be addressed to a degree by different population dynamics. To better understand genetic programming and related issues, this thesis will address population dynamics by studying the more specific issue of population diversity. Diversity describes the amount of variation in the population, and as shown in later chapters, has wide-reaching effects on many aspects of the search process.