Next: 2 Relevance to Other
Up: 3 Genetic Outliers and
Previous: 3 Genetic Outliers and
  Contents
The Tree-String problem consists of the following components:
- A target tree is defined with nodes labeled as leaf (l) and
non-leaf (n). In this study, these structures will be binary and
produced with a random tree growing method shown to be consistent
with standard genetic programming [Daida, 2002]. The method
is also similar to those used in designing tree-based hill-climbing
methods, e.g. [Juels and Wattenberg, 1995].
Pairs of nodes (representing two child nodes)
are repeatedly added to a tree, beginning with only
a root node, until a predefined size is reached. The point of attachment
of the child nodes is selected
by assigning each leaf node an equal probability prior to each
growth phase.
- A target string is defined consisting of symbols from the alphabet
with length equal to the number of nodes in the target tree.
The target string is generated by
iteratively selecting random symbols from
.
- Two objective functions are defined for the target tree and
target string as:
- Given a breadth-first traversal
of the target and candidate trees, with nodes
,
the longest common subsequence
is found between the resulting strings. The target tree size minus
this value is returned for the tree objective.
- Given a depth-first traversal of the
target and candidate trees, with nodes
,
the longest common subsequence
is found between these resulting strings. The target string size minus this
value is returned for the string objective.
- A pareto criterion is used with the two objective functions,
where the better-than and equivalent-to
relationship are used for selection
(
and
refer to the
string objective and the tree objective, respectively), and
The evolutionary algorithm attempts to minimise the
objective function using subtree
crossover, tournament selection and other standard genetic
programming parameters.
In this
study, the longest common subsequence is used as a measure of similarity
between the candidate and target trees and strings [Pevzner, 2000].
However, other measures of similarity between trees and strings are possible,
such as edit distance and sequence alignment measures from computational
biology.
There are other possible ways to specialise this problem for
particular research agendas, where the decisions made here were designed
to be consistent with common problem domains.
The tunable nature of the Tree-String problem is most easily
realised with the set
and the method of tree growing. In the
latter case, much research has demonstrated trees that are difficult
for genetic programming to reach during search [Daida et al., 2003b,Langdon and Poli, 2002].
Additionally, increasing the size of the set
will increase the difficulty
of minimising the string objective.
An example target string
with 7 nodes is:
This string could be realised over any tree with 7 nodes. An
examples of binary tree structures that are traversed breadth-first with
7 nodes is
, which represents the following tree:
Another example is the tree structure
that represents this tree:
Next: 2 Relevance to Other
Up: 3 Genetic Outliers and
Previous: 3 Genetic Outliers and
  Contents
S Gustafson
2004-05-20