next up previous contents
Next: 2 The Genetic Programming Up: 3 Genetic Programming Previous: 3 Genetic Programming   Contents

1 Foundations

Genetic programming became a popular search technique in the early 1990s due to the work by Koza (1992). Angeline (1998) traced the historical foundations of genetic programming back to Friedberg (1958) and Friedberg, Dunham and North (1959) where iterative random changes made to computer programs induced better programs. Learning machines were proposed earlier by Turing (1950), where an automated process similar to evolution in combination with human interaction was thought to be a possible way for programs to acquire intelligent behaviour. The form of genetic programming used today is most closely related to work done by Cramer (1985), where a tree representation of programs was used in conjunction with subtree crossover to evolve a multiplication function. This work also employed the use of partial credit assignment for function activity, input dependence and iterative constructs, issues that are not traditionally considered in canonical genetic programming. Other background and foundational research of genetic programming and evolutionary algorithms can be found in [Banzhaf et al., 1998,Fogel, 1998,Bäck et al., 2000a,Koza, 1994]. More modern approaches to genetic programming and evolutionary algorithms are described in [Langdon and Poli, 2002,Bäck et al., 2000b,Koza et al., 2003]. Some of these approaches are discussed in later chapters.

The theoretical foundations of genetic programming are summarised in Langdon and Poli (2002). Theories of evolutionary algorithms use abstract representations of the solution space, called schem-ata, to describe various components and behaviours of the algorithm. Holland's (1975) notion of schema for genetic algorithms was extended by Koza (1992) to include syntax trees. Syntax trees are the common representation in genetic programming, and these schema were intended to represent trees that have common subtrees. In this case, subtrees represent functional code that is combined over time to create better programs [Altenberg, 1994,Rosca and Ballard, 1996]. O'Reilly (1995) extended the idea of schema for genetic programming trees to allow for subtrees to contain wildcards, giving a more flexible definition of schema. Rosca (1997) defined a similar schema based on rooted trees, also using wildcards to allow for schema to represent templates.

Theories based on parse tree schemata are often complex. This is mainly due to the many random decisions in the algorithm, requiring the modelling of many stochastic events. However, it is not always clear that schemata are selected and combined in a predictive way. That is, the fitness resulting from the combination of two schemata with high fitness is often unpredictable. That does not, however, lessen the importance of the development of theoretical models, which have produced many practical results. The rooted tree schema [Rosca, 1997b] provided an improved understanding of the effects of popular operators and representation, while Poli (2003) used theoretical results to justify a method to control growth in solution size. Langdon and Poli (2002) and Poli and McPhee (2003a,2003b) describe their recent developments of exact schema theories for genetic programming using a complete and exact description of the representation and operators. As noted in [Langdon and Poli, 2002], this approach is costly to compute since, in a sense, it simulates the actual execution of the algorithm with all its complexities and degrees of freedom. Thus, to know what exact schemata will be generated, one can either compute an enormous number of probabilities of events and their likely results, or one can execute the algorithm.


next up previous contents
Next: 2 The Genetic Programming Up: 3 Genetic Programming Previous: 3 Genetic Programming   Contents
S Gustafson 2004-05-20