Next, the fit subpopulation is further divided
into the similar and the dissimilar.
The Levenshtein distance between strings, or string edit distance,
is used to define the
distance between structures. The structure is represented by a
breadth-first traversal of trees with node labels
[n,l].
This is the same distance measure used in previous chapters, called
edit distance one, except trees only consist of the symbols `n' and `l'.
Each individual's pair-wise distance is
the average edit distance to the rest of the population, where each
distance is
normalised by dividing by the the larger size of the
pair of trees.
The mean pair-wise distance of the population is then found by dividing
the summation of all individual pair-wise distance's by the population size.
The subpopulation that is better-than half the population, the
fit subpopulation, is then further divided into those which have a
pair-wise distance to the population that is less than or equal to
two-standard deviations from the population's mean pair-wise distance.
This subpopulation is called the in-liers.
The subpopulation that is left is called the outliers, which
are genetically different from the rest and better-than more
than half the population.