Mutation by imitation in boolean evolution strategies

. Adaptive heuristics have been developed in the Evolution Strategy (ES) frame regarding the mutation of real-valued variables. But these heuristics poorly extend to discrete variables: when the rate or variance of mutation gets too small, mutation has no e(cid:11)ect any more. To overcome this problem, we propose two mutation operators, that use the worst individuals of the current population as beacons indicating the limits of the current promising region: Mutation by di(cid:11)erentiation drives individuals away from the beacon-individuals. Mutation by imitation inversely assumes that beacon-indi-viduals still contain relevant informations, and aims at moving the individual at hand nearer to the beacons. Mutation by imitation produces o(cid:11)spring that share the features of several \parents"; but in contrast with classical crossover, it allows one to control the distance between the o(cid:11)spring and the main parent, by (cid:12)xing the number of bits to mutate. Mutation by imitation thus permits a tunable exchange of informations among individuals. Both operators have been implemented in a boolean ((cid:22) + (cid:21)) ES framework, and experimented on four problems: the Royal Road problem, a GA-deceptive problem, the combinatorial multiple knapsack optimization problem and the Long Path problem. Comparative validation is presented and discussed.


Introduction
The three historical roots of evolutionary computation are known to be evolutionary programming (EP), evolution strategies (ES) and genetic algorithms (GA).Original ES were developed in the mid sixties 24,23] to deal with real parameter optimization.EP was initially developed in the context of Finite State Machines 7], but was later modi ed and tuned to handle di erent representations, including real parameters.Last, early GAs heavily relied on the bitstring representation 10, 9], and were later extended to other representations 22].
No optimization method can accurately handle all optimization problems, according to the \No Free Lunch Theorem " 28].The key question thus remains to choose the good evolutionary algorithm depending on the current problem.To address this issue, comparative validations have been conducted on many test-beds, most of them involving real-valued tness functions (e.g. the DeJong test-suite 3]); see 2, 5, 6] among many others.Except in the GA literature, little attention has been paid to boolean problems, despite the fact that real-world problems often involve discrete features.The evolutionary landscape thus appears divided: GAs are to be preferred for discrete problems only 22,12]; In the meanwhile, the e cient strategies developed in EP and ES to evolve real-valued features seem to transpose poorly to the discrete case 1].This paper is concerned with the mutation of discrete features in the framework of evolution strategies.The idea is to use the current population as a reservoir of information.More precisely, the worst individuals in the current population are considered as beacons signaling the limits of the present promising region, which adaptively move as the population is driven toward regions of increasingly high tness.These beacons give rise to two mutation operators: mutation by imitation moves the individual at hand toward the beacons, whereas mutation by di erentiation rather moves the individual at hand away from the beacons.These mutation operators allow individuals to exchange information and directly in uence each other.This was only possible so far through recombination.The advantage of the proposed mechanism is that it allows one to control the distance between o spring and parents by xing the number of bits to mutate, whereas classical recombination o ers no guarantee as to the distance between o spring and parents.This paper is organized as follows.Section 2 brie y reviews the main trends regarding the mutation of real-valued and discrete features in EP and ES.Mutation by imitation and mutation by di erentiation are described in section 3.These operators are experimented on four problems: the Royal Road 18, 19, 8], the Ugly (GA-deceptive) problem 27], the combinatorial multiple knapsack optimization problem 21, 14], and the Long Path problem 11].Comparative results are discussed in section 4. We conclude with some avenues for further research.

State of the art
The general real-valued optimization problem is stated as: F being a real-valued function de ned on IR n , nd x 2 E IR n , such that F(x) = MaxfF(y); y 2 Eg Real-valued evolutionary algorithms now exist in all three communities (see for instance 16] for an example of real-valued GA).Real-valued crossover is based on the linear combination of two or several parents 16, 2], or on classical n-point recombination.Real-valued mutation involves the addition of a Gaussian random variable: x i := x i + N(0; i ) where N(0; ) represents the Gaussian random variable of standard deviation .The main di culty remains to adjust parameters i .
Most real-valued GAs use xed i , or i decreasing along generations according to a xed schedule.Early ESs use the well-known 1=5 rule 23], where a global is modi ed according to the recent e ects of mutation: if more than one-fth of the mutations have been successful in the last generation (i.e.lead to an o spring more t than the parent), increase , otherwise decrease .The schedule for increasing and decreasing is geometrical schedule (the factor suggested by Schwefel 24] is 1.1).Last, in early EP 4], the standard deviation of mutation was determined on the basis of the tness of the individual at hand, in such way that most t individuals undergo mutation with proportionally smaller standard deviations.
Both ES and EP nally turned to adaptive mutation: the standard deviations i are encoded in the individuals, and themselves undergo evolution.The standard deviations are therefore adjusted \for free" along evolution 2].
This mechanism encounters several di culties in the GA framework, which uses mutation very sparingly (mutation rate per bit is often around 10 3 ): this gives little chance for the adaptive mutation of real-valued features to actually adapt.Further, Gaussian-like mutation seems to poorly evolve discrete features 1]: when the standard deviation gets very small, mutation comes to having no e ect at all on discrete features.This violates the strong causality principle, stating that \a small change in the parameters should result in a small change in the tness" 23]; as a matter of fact, no \small change" below the one-bit threshold is possible in the discrete case.

Evolution under In uence
This section presents three population-driven mechanisms to mutate discrete features in the ES framework.

Evolution and Induction
Our approach is loosely inspired from previous work devoted to the inductive control of evolution 26, 25]: evolution is considered a sequence of events (operators giving birth to new individuals), classi ed good or bad depending on whether the o spring are more or less t than the parents.These events, gathered through either spying evolution or experimenting on the current population, are processed by Inductive Learning algorithms 17,15].Induction can thus construct online some \law of bad events", which may then be used in a variety of ways to guide the next generations (e.g. to control the disruptiveness of evolution operators, or prevent the loss of genetic diversity).The coupling of Evolution and Induction allows one to continually create, use and update knowledge about evolution.
The present paper is similarly based on the assumption that one can get valuable hints by observing evolution, and use them to guide the further steps of evolution.The di erence is that what was previously observed was the evolution operators, whereas the present work directly considers individuals.

Imitation and Di erentiation
Induction needs positive and negative examples; these are respectively set to the most t and less t o spring 1 .
Indeed, the di erence of tness between a positive and a negative example is due to the di erence of their gene values.From this fact, one may infer that mutation should preserve these di erences, and preferably modify those bits which take same value for the good individual at hand, and the negative examples.Practically, for each individual Ex, the pro le (p 1 ; : : : ; p N ) of its di erences with the negative examples is computed, where p i is the fraction of negative examples di ering from Ex on bit x i .Mutation by di erentiation then mutates bit x i in individual Ex with a probability proportional to 1 p i ; the resulting o spring will be still farther from the negative examples, than the parent was.
The number K of bits to mutate is set by the user, and it is presently constant along evolution and over the population.Mutation by di erentiation expectedly leads to highly diversi ed populations: if most o spring share a given bit value, this bit will be mutated with high probability.This covers as a particular case strategies devised to restore genetic diversity (see 16] among others).
One could also consider that negative examples contain valuable, if not rstrate, information: after all, these individuals or their parents have survived up to now.This leads to mutation by imitation, where bit x i in individual Ex is mutated with a probability proportional to p i .Mutation by imitation tends to \repair" explorers whenever they have lost some information shared by the previous population; it favors uniformity.
Last, and since the two above operators have opposite virtues, a third possibility consists in alternating mutation by imitation and mutation by di erentiation: this hopefully allies the conservative properties of mutation by imitation and the high rate of exploration of mutation by di erentiation.
We nally propose three kinds of evolution \under in uence": evolution by imitation, evolution by di erentiation and alternate evolution.The two rst schemes respectively use mutation by imitation and mutation by di erentiation as single evolution operator.In the third scheme, even-numbered generations undergo mutation by imitation whereas odd-numbered generations undergo mutation by di erentiation.

Experimental Validation
Evolution under in uence has been implemented in a standard ( + ) ES frame, and compared to canonical GA, canonical ES (based on the 1=5 rule), and standard adaptive ES, on four problems.

The problems
The Royal Road problem was conceived by Holland, Mitchell and Forrest 18] to study into details the interaction of features most adapted to GA search.An analysis of the unexpected di culties of this problem is found in 19,8].
The Ugly problem is a GA-deceptive problem 27], built by concatenation of 10 instances of the elementary problem, de ned on = f0; 1g 3 , by F(x) = 3 if x = 111; F(x) = 2 for x in 0 ??, and F(x) = 0 otherwise.
The combinatorial multiple knapsack optimization problem 14] is de ned as follows: Given P knapsacks having respective capacities c 1 ::c P , Given N objects having respective costs p 1 ::; p N , Given the overall dimension w i;j of object i regarding knapsack j, Determine a subset of objects noted X = x 1 ; ::x N , with x i boolean, that is feasible, i.e. satis es the constraints relative to the maximal capacities of all knapsacks, and maximizes the overall pro t: Max f P N i=1 p i :x i ; 8j = 1::P; P N i=1 w i;j x i < c j :g A usual heuristics in evolutionary constrained optimization 16] consists in reducing the tness of non feasible individuals:

r is the percentage of satis ed constraints
The Long Path problem was conceived by Horn and Goldberg 11] as a unimodal problem hard for local searchers, that is, hill-climbers and the GAs mutation operator.The tness landscape is composed of a large low-tness plateau, within which a path of slowly increasing tness leads to the unique optimum.This path is of length 2 N=2 (N is the length of the bitstring).Two successive individuals on the path di er by one bit, while all other individuals on the path are at Hamming distance of at least 2.

Experimental settings
All results are averaged on 15 independent runs.The dynamics of evolution is visualized by plotting the (averaged) best tness obtained for a number of tness calculations.We compared six evolutionary schemes: A canonical GA (CGA) serves as reference 9]: the parents are selected based on their tnesses with rank-based selection, 2-point crossover is applied at the rate of 0.6; the mutation is applied at the rate of 0:2 N , where N denotes the length of the bitstring.
A \traditional ES" (TES) is a boolean ( + ) evolution strategy involving a single mutation rate per bit ; is modi ed according to the 1/5 rule of Rechenberg 23].The geometrical factor used to increase ranges from 1.1 to 2. An \adaptive ES" (AES) is a boolean ( + ) evolution strategy that transposes the adaptive mutation described

Comparative results
The Royal Road problem.The canonical GA evolves a population of 60 individuals; all other algorithms follow a (30 + 60) ES scheme.The imitation scheme outstandingly outperforms all other schemes.This can be explained from the resemblance between mutation by imitation and crossover, and the fact that this problem explicitly relies on the building block hypothesis.In the meanwhile, the elitist ( + ) scheme prevents evolution from the eeting discovery phenomenon 8], which was considered the main cause for the di culties of standard GAs on this problem.

Nb of generations
The Ugly problem.The canonical GA evolves a population of 40 individuals; all other algorithms follow a (20 + 40) ES scheme.
The AES algorithm (ES with adaptive mutation) slightly supersedes the TES algorithm (ES following the 1/5 rule), and both signi cantly outperform the canonical GA algorithm on this GA-deceptive problem (Fig. 2(a)).The di erentiation scheme appears the best one on this problem, though the imitation scheme soon catches up; this can be interpreted as di erentiation allows rejecting suboptimal solutions previously found (the deceptive schemata).Again, this is possible only because the ( + ) scheme counterbalances the disruptive e ects of mutation by di erentiation, which basically escapes any good schema the current population is settled in.

Nb of generations
The Knapsack problem.The canonical GA evolves a population of 40 individuals; all other algorithms follow a (20 + 40) ES scheme.Figure 4 shows average results for 1000 generations.Note that the optimum will always be found by all methods in the (very) long run, by climbing up the path through mutations.What is measured here is the ability to nd short-cuts in spite of the low-tness plateau between di erent components of the path.The long path problem presents intriguing particularities.First, the simple, traditional ES signi cantly supersedes CGA, whereas this problem was meant to be hard for mutation alone.Second, it demonstrates that alternate evolution can behave quite di erently (and in fact, much better) than evolution by imitation or evolution by di erentiation stand-alone.A tentative explanation is related to the accordion-like behavior of alternate evolution: mutation by imitation reduces the diversity and moves an individual toward the previous local optimum closest to this individual; then, mutation by di erentiation re-increases the diversity and spreads individuals away.The combination of these two phases seemingly eases the discovery of shortcuts on the long and sneaky path toward the optimum.

Discussion and Perspectives
The central characteristics of evolution under in uence is to allow individuals to exchange information along evolution.Such exchange is usually ensured by recombination, does it concern two parents or more 2].And, according to the building blocks principle 10, 9], this exchange is the essential factor of biological and arti cial evolution.
Another analysis 13] emphasizes that recombination can, in some contexts, be advantageously replaced by macro-mutation | equivalent to recombination with a random parent.It is noteworthy that macro-mutation and recombination o er di erent kinds of control on the di erence between o spring and parents.Macro-mutation ensures that the o spring will be signi cantly di erent from the parent | but does not set any bias on the di erence.In opposition, recombination is heavily biased: the localization, and hence the amount, of the di erence between an o spring and a parent depends on the current population.
To sum up, macro-mutation controls how much o spring are di erent from parents, but does not care about where, that is, which bits are modi ed.Inversely, recombination tends to modify bits so as to copy other individuals in the population, which implies that it hardly has any e ect when the population is on the verge of converging.
Mutation by imitation combines the e ects of recombination and macromutation in that it allows for a quantitative and qualitative control of the difference between o spring and parents: The amount of di erence (the number of mutated bits) is set by the user; The localization of the di erence (which bits are modi ed) depends on the other individuals.Mutation by di erentiation and alternate mutation similarly enable an exchange of information between individuals, that is tunable by the user.
The main originality of this work, in our opinion, is to make clear the distinction between the amount and the nature of modi cations done by evolution.Further, evolution under in uence determines the nature (that is, the localization) of the modi cations in a way which depends on both the current population, and the individual at hand.Many improvements can be brought to evolution under in uence.Further research is concerned with adaptively adjusting the number of bits to mutate, and the kind of mutation (by imitation, di erentiation, or alternate) to apply on a given individual.Mutation could also take into account, besides the di erence between the current individual and the negative examples, its di erence with other positive examples.
Last, this scheme will be extended to multi-modal tness landscapes; the idea would be to evolve an individual depending on the negative examples nearest to this individual, in order to separately follow several tracks.
in 1] with minor di erences: Each individual is attached one mutation rate p (the probability of mutation of any single bit) which itself undergoes mutation according to Obalek's rule 20, bound of 1=N on p to guarantee e ective mutation.Mutations under in uence have been implemented in the frame of a ( + ) ES. Positive and negative examples respectively are the most t and less t individuals among the o spring.Imitation-and di erentiation-based evolutions respectively use mutation by imitation and mutation by di erentiation as single evolution operators.Alternate evolution alternatively evolves populations through mutation by imitation and mutation by di erentiation.Unless otherwise speci ed, the number K of bits to mutate is set to 1.

Figure 1 :
The Royal Road, Dynamics of evolution (beware of the Y-scale).

Figure 2 :
The Ugly problem, Dynamics of evolution.

Figure 3 :
The Knapsack problem, Dynamics of evolution.This problem includes a great many of local optima, and a strong mutation is needed to avoid premature convergence; mutating more than just one bit (K = 2) demonstrates clearly bene cial in the Imitation and Di erentiation schemes.The Long Path problem.The canonical GA evolves a population of 21 individuals; all other algorithms follow a (7+21) ES scheme.The size of the problem (number of bits of individuals) is 91.Parameter K is set to 2, the minimum jump for short-cuts on the path.

Figure 4 :
The Long Path problem, Dynamics of evolution.