Secondly, the method of initialisation is dependent upon the operator currently being used. Given the use of an operator like standard subtree crossover, it makes little sense to expect the search to take place near the root of the trees when this operator is biased toward the leaves. Thus, initialising the new species based on the seed might focus the changes near the leaves first, then toward the root after a sufficient level of diversity is found. More homologous operators may be able to use a looser definition of species due to a lack of bias toward a particular area of the tree. In some cases, it may be good to just perform a phase of local search on the diverse individuals which would normally define a new species. In fact, this is a good way to initially validate this model. The local optimisation technique would probably need to be defined for each problem instance. According to the results presented in this thesis, the canonical genetic programming system routinely produces outlier individuals that would seem to be ideal candidates for further examination. While previous methods may have leveraged these individuals in more indirect ways, the proposed model intends to leverage these individuals directly.
What is not addressed in this study is the composition of the in-lier space. While previous research has shown the strong convergence characteristic of the population with respect to structure and content, the in-lier space may indeed be composed of several distinct clusters of genetically similar individuals. Figure 7.11 demonstrates how the in-liers could be composed of four distinct genetically similar clusters, instead of the single cluster shown in Figure 7.1. However, the actual identification of these spaces and distinct clusters is a complex issue requiring specific measures and methods.
The proposed model was largely motivated by the momentum in the literature toward the importance of structure, e.g. [Daida et al., 2003b], and diversity (as seen in earlier chapters). However, while the model proposed is general (i.e. explicitly considering outliers should generally improve results), the exact performance and tuning of model parameters will be problem specific.