The theoretical foundations of genetic programming are summarised in Langdon and Poli (2002). Theories of evolutionary algorithms use abstract representations of the solution space, called schem-ata, to describe various components and behaviours of the algorithm. Holland's (1975) notion of schema for genetic algorithms was extended by Koza (1992) to include syntax trees. Syntax trees are the common representation in genetic programming, and these schema were intended to represent trees that have common subtrees. In this case, subtrees represent functional code that is combined over time to create better programs [Altenberg, 1994,Rosca and Ballard, 1996]. O'Reilly (1995) extended the idea of schema for genetic programming trees to allow for subtrees to contain wildcards, giving a more flexible definition of schema. Rosca (1997) defined a similar schema based on rooted trees, also using wildcards to allow for schema to represent templates.
Theories based on parse tree schemata are often complex. This is mainly due to the many random decisions in the algorithm, requiring the modelling of many stochastic events. However, it is not always clear that schemata are selected and combined in a predictive way. That is, the fitness resulting from the combination of two schemata with high fitness is often unpredictable. That does not, however, lessen the importance of the development of theoretical models, which have produced many practical results. The rooted tree schema [Rosca, 1997b] provided an improved understanding of the effects of popular operators and representation, while Poli (2003) used theoretical results to justify a method to control growth in solution size. Langdon and Poli (2002) and Poli and McPhee (2003a,2003b) describe their recent developments of exact schema theories for genetic programming using a complete and exact description of the representation and operators. As noted in [Langdon and Poli, 2002], this approach is costly to compute since, in a sense, it simulates the actual execution of the algorithm with all its complexities and degrees of freedom. Thus, to know what exact schemata will be generated, one can either compute an enormous number of probabilities of events and their likely results, or one can execute the algorithm.