next up previous contents
Next: 2 Relevance to Other Up: 3 Genetic Outliers and Previous: 3 Genetic Outliers and   Contents

1 The Tree-String Problem

The Tree-String problem consists of the following components: The evolutionary algorithm attempts to minimise the objective function using subtree crossover, tournament selection and other standard genetic programming parameters.

In this study, the longest common subsequence is used as a measure of similarity between the candidate and target trees and strings [Pevzner, 2000]. However, other measures of similarity between trees and strings are possible, such as edit distance and sequence alignment measures from computational biology. There are other possible ways to specialise this problem for particular research agendas, where the decisions made here were designed to be consistent with common problem domains. The tunable nature of the Tree-String problem is most easily realised with the set $S$ and the method of tree growing. In the latter case, much research has demonstrated trees that are difficult for genetic programming to reach during search [Daida et al., 2003b,Langdon and Poli, 2002]. Additionally, increasing the size of the set $S$ will increase the difficulty of minimising the string objective.

An example target string with 7 nodes is:

\begin{displaymath}
A B B C A D D .
\end{displaymath}

This string could be realised over any tree with 7 nodes. An examples of binary tree structures that are traversed breadth-first with 7 nodes is $ n l n l n l l  $, which represents the following tree:

\begin{displaymath}
\Tree [.n(A) l(B) [.n(B) l(C) [.n(A) l(D) l(D) ] ] ]
\end{displaymath}

Another example is the tree structure $ n n n l l l l  $ that represents this tree:

\begin{displaymath}
\Tree [.n(A) [.n(B) l(B) l(C) ] [.n(A) l(D) l(D) ] ]
\end{displaymath}


next up previous contents
Next: 2 Relevance to Other Up: 3 Genetic Outliers and Previous: 3 Genetic Outliers and   Contents
S Gustafson 2004-05-20