Next: 1 Population Measures: Entropy Up: 6 Effects of Population Previous: 1 Code Growth and   Contents

2 Regression Problems and Increased Difficulty

Building on the study by Daida et al. (2001), the Binomial-3 problem with varying ERC ranges is used to increase difficulty. As in Chapter 5, the Binomial-3 problem consists of approximating the function using the functions and the terminals . are ERCs in the ranges of [-1,1) (referred to lated as unity), [-10,10) (ten), and [-100,100) (`hundred'), where larger ranges create increasing difficulty. The Binomial-3 function is shown graphically over the interval [-1,0] in Figure 6.1 (top left). There are several solutions to this function which can be represented easily by many different combinations of functions and terminals. The easiest version of this problem would be expected to allow for many close-to-optimal initial solutions and more new optimal solutions to be acquired in every generation. Harder instances are likely to cause more difficulty in finding good solutions early and often.

To complement this problem, a method is used to generate random polynomial instances [Ekárt and Németh, 2002]. Random polynomials are generated up to degree 11 with non-zero coefficients and with real roots in the range . As higher degree polynomials have a larger number of roots close to zero, most of the values they take in the interval are close to zero. For genetic programming, it is easy to find relatively close fit solutions in the form of straight lines close to the X axis. However, it is difficult to improve on these solutions to find a really good approximation, especially if the relatively fit and simple individuals spread across the genetic programming population after a few generations.

The polynomials of increasing degree with non-zero coefficients and real roots in the interval constitute a class of increasing difficulty problem instances which are suitable for studying the behaviour of genetic programming, as they have the following properties:

• Minimum Required Depth. Higher degree polynomials can be approximated less easily with our function set. Each term of degree is formed by a sequence of . This, in general, will require larger solutions for a good approximation of a higher degree polynomial. A polynomial with real roots and non-zero coefficients can be expressed as a tree by using functions (i.e., multiplications and subtractions) and terminals. The minimum depth (thus complete) tree representing this polynomial is of depth .
• Maximum Allowed Depth. It follows from the above requirement that polynomials with higher degree are harder for genetic programming to approximate if allowing the same limited depth for trees, as there is less space for genetic programming to grow introns. Introns refer to non-functional code in the program. In these cases the trees must be more efficient in using more nodes to express the solution.
• Approximation Ability. Lastly, the Matlab Polyfit routine is used to assign a degree of approximation ability to the polynomials. Polyfit finds the best polynomial of given degree to fit a set of points. The polynomials of degrees 11, 7 and 3 (shown in Figure 6.1) were best fitted as polynomials of degrees 11, 8 and 3, respectively, with errors of order , the approximation error being the highest for the 11 degree polynomial. That is, there were no close approximate polynomials of degree to generated polynomials of degree with . Furthermore, as the generated polynomials are best fit with Polyfit polynomials of similar degree, a successful application of genetic programming on this problem should also find polynomials of similar degree.

The type of increased difficulty posed by the random polynomials and the Binomial-3 function is similar. The Binomial-3 problem must cope with ERC values spanning an increasing range. Likewise, for the random polynomials, when the degree increases, the fixed ERC range will become less appropriate as the polynomials contain variations of smaller magnitude.

For assessing the differences in behaviour between easy and difficult instances, two measures of diversity are used. The entropy measure is used for indicating the distribution of individuals over fitness values (and subsequently the level of selection pressure) and the edit distance measure is used for indicating the structural diversity of a population. These measures are described next.

Subsections

Next: 1 Population Measures: Entropy Up: 6 Effects of Population Previous: 1 Code Growth and   Contents
S Gustafson 2004-05-20