next up previous contents
Next: 2 Research Perspective Up: 1 An Analysis of Previous: 1 An Analysis of   Contents

1 Introduction

Genetic programming uses a population of variable-length computer programs within the metaphor of evolution to guide search [Cramer, 1985,Koza, 1992]. Search is a framework for problem solving where alternative solutions are evaluated in a trial-and-error fashion. A goal of search is to find an ordering of alternative solutions that leads to good solutions. When enumeration over the entire space of alternative solutions is not possible, heuristics like genetic programming use rules-of-thumb to guide the search for alternative solutions. Genetic programming belongs to a class of algorithms which use the metaphor of Natural Selection [Darwin, 1859] to determine the ordering of alternative solutions during search.

Genetic programming is a simple and powerful technique which has been applied to a wide range of problems in combinatorial optimisation, automatic programming and model induction (see [Banzhaf et al., 1998] for a general introduction to genetic programming). The direct encoding of solutions as variable-length computer programs allows genetic programming to provide solutions that can be evaluated and also examined to understand their internal workings. In this way, a genetic programming solution can represent a single value after evaluation (as in optimisation), an iteratively built algorithm (as in automatic programming) or a data-driven model useful in data mining and knowledge discovery (as in model induction). While genetic programming is applied successfully in all three cases, each of which merits its own scientific discipline, the focus of this thesis is on the inner workings of the algorithm itself.

The issues of problem solving that motivate the use of genetic programming also represent some of the limits to its applicability. The increase in computer power, the wide range of applications and the ability to collect enormous amounts of scientific data all require a computational framework, such as genetic programming, capable of automatically generating solutions with a representation that is both powerful and understandable. However, the genetic programming algorithm has a high computational cost to run, has difficulty scaling to larger and harder problem instances, and uses a complex representation that can also limit its use.

At the heart of these issues is the use of a population of alternative solutions and the methods used to generate new alternatives. A candidate, or alternative, solution is evaluated by means of a fitness function that allows it to be compared with previously evaluated solutions. A stochastic selection method chooses better solutions from the population that then undergo stochastic variations to produce new alternatives. In this way, the population is responsible for guiding a parallel search through the solution space. However, the evaluation of each population member becomes increasingly computationally expensive and designing an effective process of variation on the direct and complex encoding of solutions is not intuitive. Also, the amount of data produced by a population-based search over an enormous space of possible programs makes the prediction and analysis of algorithm behaviour difficult.


next up previous contents
Next: 2 Research Perspective Up: 1 An Analysis of Previous: 1 An Analysis of   Contents
S Gustafson 2004-05-20