- A target tree is defined with nodes labeled as leaf (l) and non-leaf (n). In this study, these structures will be binary and produced with a random tree growing method shown to be consistent with standard genetic programming [Daida, 2002]. The method is also similar to those used in designing tree-based hill-climbing methods, e.g. [Juels and Wattenberg, 1995]. Pairs of nodes (representing two child nodes) are repeatedly added to a tree, beginning with only a root node, until a predefined size is reached. The point of attachment of the child nodes is selected by assigning each leaf node an equal probability prior to each growth phase.
- A target string is defined consisting of symbols from the alphabet with length equal to the number of nodes in the target tree. The target string is generated by iteratively selecting random symbols from .
- Two objective functions are defined for the target tree and
target string as:
- Given a
*breadth-first*traversal of the target and candidate trees, with nodes , the longest common subsequence is found between the resulting strings. The target tree size minus this value is returned for the tree objective. - Given a
*depth-first*traversal of the target and candidate trees, with nodes , the longest common subsequence is found between these resulting strings. The target string size minus this value is returned for the string objective.

- Given a
- A pareto criterion is used with the two objective functions,
where the
*better-than*and*equivalent-to*relationship are used for selection ( and refer to the string objective and the tree objective, respectively), and

In this study, the longest common subsequence is used as a measure of similarity between the candidate and target trees and strings [Pevzner, 2000]. However, other measures of similarity between trees and strings are possible, such as edit distance and sequence alignment measures from computational biology. There are other possible ways to specialise this problem for particular research agendas, where the decisions made here were designed to be consistent with common problem domains. The tunable nature of the Tree-String problem is most easily realised with the set and the method of tree growing. In the latter case, much research has demonstrated trees that are difficult for genetic programming to reach during search [Daida et al., 2003b,Langdon and Poli, 2002]. Additionally, increasing the size of the set will increase the difficulty of minimising the string objective.

An example target string
with 7 nodes is:

This string could be realised over any tree with 7 nodes. An examples of binary tree structures that are traversed breadth-first with 7 nodes is , which represents the following tree:

Another example is the tree structure that represents this tree: