Revisiting the Memory of Evolution

A new evolution scheme is presented, memorizing the extreme (best and worst) past individuals through distributions over the binary search space. These distributions are used to bias the mutation operator in a (μ + λ) Evolution Strategy, guiding the generation of newborn offspring: different mimetic strategies are defined, combining either attraction, indifference or repulsion with respect to the two distributions. These distributions are then updated from the best and the worse individuals in the current population. Experiments on large size binary problems allow one to delineate the niche of each of these mimetic strategies.


Introduction
The powerful process of natural evolution indeed produced biological chefs d'oeuvre.The eld of arti cial evolution is concerned with transposing and mastering the strengths of evolution within the machine world 24,45,12,28].A number of applications, ranging from optimization and design problems to adaptation and evolvable hardware, fully demonstrates the e ciency of the approach.
In its rst period, arti cial evolution was almost exclusively inspired from biology (adaptation 24], diploidy 19], introns 29], Baldwin e ect 23] . . .among others) and most authors praised arti cial evolution as a universal tool 18].Nowadays, it is widely acknowledged that one should take into account the speci cities of the problem at hand, through the representation of the search space and the design of the evolution operators.The advantages of using "expert" representation and operators are illustrated for instance in the domain of combinatorial optimization 36] or shape design 44] (see 35] for general recommendations about representation/operators). As noted by Janikow 26], this transition is quite similar to what happened in the eld of arti cial intelligence (see 41] for a survey): at rst, people were fascinated by the generality of the principles at hand and they aimed at universal tools, e.g. the General Problem Solver 33].Afterward, they realized that the di erence between being able to solve the problem and actually solving it, was a matter of knowledge | and this led to developing knowledge-based systems (KBS) 7].
The history of arti cial intelligence has perhaps one further lesson to o er arti cial evolution.The KBS approach met many successes as more accurate ways of representing and using knowledge were designed.However, where does knowledge come from in the general case ?As noted by E. Feigenbaum, the main bottleneck of arti cial intelligence was eventually the knowledge acquisition phase; this gave birth to a new eld of AI: machine learning 31].To put it into a nutshell, the history of arti cial intelligence suggests that any advanced information processing tool needs knowledge, and that the only practical way to have knowledge is to acquire it, that is, to devote some e orts to learning.So what could be the role of a learning module within arti cial evolution ?
To keep clear distinctions between evolution and machine learning, the knowledge to be automatically acquired by arti cial evolution is referred to as memory of evolution, and the way it is learned is referred to as memorization.Memory and memorization are peripheral concerns for arti cial evolution, though many works concerned with the control of evolution could be rephrased in these terms (more in section 2).Indeed, memory is not much present in natural evolution, barring the genetic material itself.And natural evolution still is the dominant paradigm for arti cial evolution; one is prone to dismiss concepts and heuristics which are too far away from what one understands to be evolution 1 .
On the other hand, as emphasized in 9], the "speci cations" for natural evolution might be rather di erent from that of arti cial evolution.Notably, natural evolution explores a changing tness landscape while arti cial evolution considers a xed tness landscape most of the times.When the context changes, recording the past of evolution does not make much sense: even if it were feasible, this would provide irrelevant or, even worse, misleading information.An accurate memorization process should then keep a cautious balance between memorizing and forgetting.
The situation is much more straightforward when the world does not change: in principle, evolution could plainly and soundly use its history to avoid the repetition of previous trials/errors, in the line of Tabu Search 16].The question no longer regards the utility of memory, but rather its technical feasibility.
We propose a categorization of the possible memories of evolution, and the way these can be used in section 2. No wonder this categorization is close to that proposed for the control of evolution 22], as memory and control of evolution are tightly related.We introduce a fur-ther distinction, as to whether the memory allows for the distinction between new o spring and previously generated individuals.If the distinction is possible, the memory can be used to favor the exploration of brand new individuals: evolution becomes irreversible.To do so, memory-based control can either proceed by incentives or inhibitions.The Population Based Incremental Learning (PBIL) 5,4] algorithm proceeds by incentives: it constructs the memory of the past best individuals encountered during evolution, and use this memory as an attractor of the o spring.Evolution by Inhibitions (EbI) 47,48] proceeds by inhibitions: it constructs the memory of the past worst individuals of evolution and use this memory as a beacon to draw the population away from the dead-ends.
In this paper, both approaches are combined: we investigate how evolution can take advantage of incentives and inhibitions altogether.Only binary search spaces (f0; 1g N ) are considered.The memory of best/worst individuals does then belong to the continuous space 0; 1] N ; it corresponds to a "virtual individual", or model.Individuals are provided with the two models memorizing the best and the worst past individuals, respectively termed the Leader and the Repoussoir 2 .Each individual uses the models as reference points, to decide where it should go next, i.e.where to localize its o spring.The o spring are generated as to be closer to, or farther away from, a model.Metaphorically, the individual imitates or rejects the models: its "social strategy" dictates the distribution of its o spring.For instance, one natural strategy is to reject the Repoussoir and imitate the Leader (termed Sheep strategy); another one, termed Lone Rider strategy, is to reject both the Leader and the Repoussoir.
The presented scheme of evolution, termed Mimetic Evolution, thus abandons the biology paradigm, and rather nds its metaphors in the eld of sociology.Mimetic evolution involves a single evolution operator, termed mimetic mutation, which replaces both standard mutation and crossover.The material in each individual is modi ed according to the models, and depending on the social strategy of the individual.We restrict ourselves to consider a single social strategy for all individuals during all evolution.
The paper is organized as follows.Next section discusses the possible roles of memory depending on whether the tness landscape changes or not.It brie y reviews some related work concerned with the control of evolution, and focuses on irreversible control.
Section 3 details the Mimetic Evolution scheme combining PBIL and Evolution by Inhibition.This scheme involves mimetic mutation as single evolution operator.Like standard mutation, mimetic mutation considers one individual at a time.The choice of the bits to mutate is based on the memories and can achieve the diversi cation or the recombination of individuals depending on the chosen social strategy.
Experiments on large sized binary problems are presented in section 4.These show that the Sheep and Lone Rider strategies are relevant to many problems | not all, as could have been expected.Interestingly, the best strategy for a problem gives some insight into the di culty of the problem: the Sheep strategy shows more adapted to climb simple slopes, in a gradientlike manner; the Lone Rider strategy primarily preserves the diversity of the population, and thus escapes more easily from local optima.The paper ends with some perspectives for further research.

Memory in natural and arti cial evolution
Obviously, arti cial evolution only constitutes a coarse simpli cation of natural evolution.Our claim is that even greater simpli cations are possible by taking advantage of the steadiness of the arti cial milieu.
Simpli cations are obtained through a better use of the available information in each step.We assume in the remainder of the paper that the optimization problem at hand is su ciently di cult, so it is worth spending a reasonable time looking for short cuts.
This section discusses earlier work related to the control of evolution.Concretely, control is a way of shifting the information processed by evolution from a low-level description (e.g. the current genetic pool, or description in extension of the problem) to a more abstract level: operator rates 8, 38], beliefs 39], rules 37], gradients 48] or even more directly, the distribution of the o spring 4], which can be viewed as a description in intension of the problem.

Changing versus Fixed Worlds
Our understanding of evolution is far to be complete, as the ends can only be conjectured 53].However, there is a wide acceptance that natural evolution is "designed" for changing environments, and aims at adaptation 24].As emphasized by 9], this means starting with a "rather good" initial population (avoiding the bootstrap problem 34]), and measuring the success from both the cumulated performance of the individuals, and the chances for a su cient fraction of the population to survive a further change of the milieu.
Speci cations for arti cial evolution are di erent: most of the times, the goal is to nd the optimum of a xed tness function.The success is only measured from the performance of the best individual.
In both contexts, the leading individuals are ceaselessly replaced by more t individuals.But within natural evolution, these more t individuals are not necessarily new individuals, as previous outsiders can come back to outperform the actual leaders.This is never true in a xed environment.
The consequences for this are manifold.Natural evolution must somehow preserve whatever has been relevant in the past, for it can become useful again later.In the meanwhile, the relevance of all information must be periodically re-evaluated as the milieu changes.As the evaluation procedure only concerns individuals, any relevant information must be coded within the genetic material, i.e. in extension.Moreover, evolution cannot draw any negative conclusions, as to which material can be soundly discarded.

Challenge for an informed evolution
Let us go back to arti cial evolution.Assuming that the world does not change allows | in principle | dramatic simpli cations of evolution: re-evaluating an individual does not provide any further information, and can therefore be omitted.
The steadiness of the milieu induces a partition of the individuals into four subsets, A The set of the individuals which are outperformed; they do not need be considered again as their time has past and will never come back.These are the dead individuals.B The set of the currently t individuals, which do not need be considered either but are somehow used to produce o spring; these are the living individuals.C The embryos include those candidate o spring given the current experience/state of evolution, which have not yet been evaluated.D All other individuals 3 .The only set which needs be evaluated and must therefore be characterized in extension, is the set of embryos C].The only set which cannot be characterized in extension is the set A] of dead individuals, as the time and space complexity would be intractable after the rst generations.
Along evolution, the size of A] grows (there are more and more dead individuals) and that of D] decreases; the size of B] remains more or less constant, except for the loss of genetic diversity.
By construction, the o spring generated in each generation are distributed among the embryos C] and the past individuals, either living B] or dead A].But the only interesting o spring are in C].This leads to two approaches of the control of evolution, depending on whether the stress is put on the expected quality of the o spring, or on their novelty.These approaches are respectively referred to as reversible and irreversible control.

Reversible control
Control is most generally concerned with biasing the distribution of the o spring, to favor the discovery of interesting individuals.
Obviously, we only can make conjectures about where the interesting individuals lie; otherwise, a deterministic approach would be recommended.Such conjectures regard the way the distribution of the o spring should depend on the current state of the system: some parameters or structures of control are identi ed.The goal is to determine relevant contents for these parameters or structures of control.
Several criteria have been proposed to distinguish between the great many controls investigated in the literature 49,22].A rst criterion considers whether the contents of the control is determined on-line or o -line 49].O -line control is adjusted via some preliminary analysis or experiments (see for instance 20]), and is used within a static or dynamic schedule.For instance, the mutation rate can be set to a constant value, or decreased via a hyperbolic schedule 3].As o -line control obviously is not concerned with the speci cities of the run, it makes no part to the memory of evolution and is not considered thereafter.
On-line control, also termed adaptive control, can operate at di erent levels of evolution: the environment, the population, the individuals, or the genes 22].The contents of control can either be deterministically adjusted depending on some prede ned indicators (explicit control); or it can be carried by individuals themselves, and adjusted "for free" by evolution (implicit control).

Explicit control
Examples among others of explicit control are the 1/5th rule 38], the adjustment of the operator rates a la Davis 8], the early EP mutation 13], the constitution of pheromone trails 10], and the adjustment of penalty factors 11].In 38,12] the control only responds to the current state of evolution (the fraction of the mutation success, the relative tness of an individual); this can be viewed as a re ex control.In 8,11,10,39,37], the control rather takes into account the whole past of evolution and proceeds by reinforcing the good options or the relevant choices (reinforced control).

Implicit control
Implicit control lets evolution itself adjust the contents of the control.This is most usually done by coding the parameters of control within the individuals, such as the mutation step size in self-adaptive mutation 45, 1] or the type or mask of crossover in genetic algorithms 43,49].Practically, the control-related part of the individual is rst evolved, then used to derive a genotype.The o spring is then composed of the current control part, and the derived genotype.Though evolution can only evaluate the genotype, it expectedly optimizes both the genotype and the control-related part of the individuals.This does not rely on any magic of evolution: rather, individuals carrying an irrelevant control part disappear as their genotype is most likely to be un t or non viable.
Hybrid control, interleaving global deterministic and local non-deterministic indicators, has also been proposed 21].

Irreversible control
In all above approaches, the time of evolution is reversible: one could obtain the parents of given o spring by applying the same operators as used to build the o spring from the parents.Basic genetic operators indi erently produce ascendants, or descendants, of the current individuals 4 .
The o spring might actually be new embryos (from set C]), or old individuals (living or dead).
Still, the fact that the world does not change implies that the only worth o spring are new individuals.This can be used as a constraint to prune the set of candidate o spring, in the spirit of the Tabu search 16].The advantage is that biasing the distribution of o spring toward the embryos (set C]) e ortlessly (so to speak) increases the e ciency of the search, everything else being equal.As a matter of fact, memory could in principle tell whether an o spring is a new individual, as all necessary information has been available at some point in the past.On the opposite, no mechanism (except evaluation) could tell whether an o spring will turn out to be t.
Irreversible control enforces the novelty of the o spring, at the cost of memorizing the past of evolution.

PBIL
Population Based Incremental Learning { PBIL 5,4] takes advantage of the similarity between evolutionary algorithms and generate-and-test methods (see 31] for a survey of Machine Learning (ML)).In both cases, the system generates questions (individuals in the arti cial evolution context, examples in the machine learning context); these questions are answered by the oracle (individuals are evaluated, examples are labeled); the answers are used in turn by the system to re ne its current hypothesis and generate next questions.The target "hypothesis" is an accurate description of either the distribution of the optima or some explanatory concept.
One key di erence between generate-and-test methods and arti cial evolution lies in the level of description of the internal state of the system.Generate-and-test methods handle high-level representations; the current hypothesis is described in intension 31].On the opposite, arti cial evolution handles the individuals themselves: its "hypothesis" is described in extension, through the population.
PBIL achieves arti cial evolution by learning a single high-level genotype: a distribution M = (M 1 ; : : : M N ) over the search space f0; 1g N .Each component M i stands for the probability that bit i is 1.PBIL initializes M to the most general distribution (M i = :5).M is alternatively updated by relaxation from the best current o spring (section 3.2), and used to generate from scratch a new population.M can be viewed as the memory of the past best o spring.
PBIL thereby sidesteps all evolution operators and selection needed to transform a population into another one.The diversity and novelty of the o spring can be directly enforced by means of the selectivity and the fading of the memory M (see section 3.2).

Evolution by inhibitions
Symmetrically, Evolution by Inhibitions (EbI) gradually constructs the memory R of the past worst o spring 48].The di erence with PBIL is that R does not provide enough information to generate the o spring from scratch.Rather, it tells where the o spring should not be: they should not be close to R.
EbI thus uses its memory to ensure that the o spring are su ciently di erent from outperformed individuals, and by extension, fall outside previously explored regions of the search space.It thereby directly enforces the novelty of the o spring.

Directions for the current study
Both above schemes involve an explicit memory of evolution, but they use it in di erent ways.
In PBIL, the memory is interpreted as a distribution over the search space: it replaces the genetic pool of evolution as it can be used to generate a population from scratch.Much attention is paid to prevent the premature convergence of the mechanism, by controlling the selectivity of the memorization process (section 3.2.1).One main limitation of PBIL is that it deals with a restricted space of distributions, assuming the independence of the genes.In other words, PBIL e ciently processes a high-level information, but within a language which might be insu cient to t complex tness landscapes.Some evidence for this remark has been presented in 47], considering the Long Path problem 25]: the distribution, even close to one point of the path, hardly meets the narrow optimal region, the path.
EbI actually transmits the genetic material from one individual to its o spring; but it uses its memory to constrain the generation of the o spring.The memory can here be viewed as a gradient, in the sense that it gives preferred directions of move.This approach can fail in two ways: it can su er from the loss of genetic diversity in the current population, like standard evolution; it might also get irrelevant if the the memory gets too rigid and induces deterministic behaviors, like PBIL.
These approaches can be combined in di erent ways.A loose coupling is obtained by generating two subpopulations, one from each algorithm.Another possibility is to extend PBIL to accommodate both memories, that of the best and of the worst individuals.Actually, some PBIL variant already memorizes some information from the worst o spring 4] (section 3.2).
The approach investigated in the remainder of this paper rather extends the mechanism of EbI and preserves the transmission of the genetic material from one individual to its o spring.This choice is motivated the fact that indeed, the population can follow any tness landscape, more accurately than any prede ned-shaped distribution, | provided the population remains diversi ed.The generation of the o spring is extended to account for the two available memories, re ecting the past best and worst individuals.

Outline
Basically, mimetic evolution alternatively evolves the population using its memories, and updates the memories from the remarkable individuals of the current population.These memories denoted L (for Leader) and R (for Repoussoir) respectively summarize the best and the worst individuals encountered by evolution.
These memories are used to guide the reproduction of individuals, as follows.Let M be a given memory, and consider all possible o spring Y obtained by ipping a given number of bits in X.These o spring can be ranked with respect to their probability of being generated from M, noted p(Y jM): If M is considered desirable, one will prefer o spring Y maximizing p(Y jM); if on the contrary, M is considered undesirable, one will prefer o spring minimizing p(Y jM).In other words, M induces a polarization of the search space which constrains the moves of X depending on its interpretation of M. Metaphorically, X will decide to imitate or reject the memory, or model: its reproduction depends on its social strategy.
Practically, mimetic evolution uses a single evolution operator, termed mimetic mutation, to evolve the current population (section 3.4).Mimetic mutation depends on the individual at hand, the chosen social strategy, and the two memories L and R.
Let us rst detail how the memories are constructed.

Memorization and models
In mimetic evolution as in PBIL and EbI, the memory of past remarkable individuals in f0; 1g N is represented as an element of 0; 1] N .The terms of model and memory are indi erently used in the following.
The models can be used in two ways.

Requirements for a distribution-like model
In PBIL, the model is used to generate new individuals from scratch.Indeed, any element M in 0; 1] N can be interpreted as a distribution on f0; 1g N : Proba(X i = 1) = M i At one extreme lies the uniform (or most general) distribution (M i = :5 for i = 1 : : : N); at the other extreme lie degenerate (or most speci c) distributions, when M is in fact a boolean individual 5 .
PBIL has investigated two heuristics to prevent the premature convergence of M 4].One is to update M from the two best o spring, instead of the best one 6 .The second one is to add small gaussian perturbations to a small percentage of the M i ; this way, the distribution of the o spring is durably perturbed.

Requirements for a gradient-like model
In Evolution by Inhibitions and Mimetic Evolution, the interpretation of M is radically di erent: the goal is to modify a given individual X, that is, to nd a relevant climbing direction (the question of the climbing step will be addressed in section 3.4).M is more to be seen as a gradient than a distribution.
Assume that M represents a region to avoid.One thus wants the o spring of individual X to be as far away from M as possible.This leads to preferably mutate the bits that do not discriminate X from M (jX i ?M i j is comparatively small).Inversely, if M represents a desirable region, one preferably mutates the bits discriminating X and M, thereby producing o spring closer to M than was X.
The o spring of an individual X are thus constrained from model M, and the "interpretation" of M by X, or social strategy of X.The social strategy de nes a direction of evolution, as it induces a preference on the moves of any individual.However, this direction might turn irrelevant either at the individual, or at the population level: The direction deduced from a model and a social strategy is irrelevant when it draws the individual away from the optimum.For instance, imitating the Repoussoir is a priori (and also experimentally) irrelevant: think of climbing a gradient on the wrong side.A social strategy can also becomes irrelevant in particular circumstances.For instance, imitating the Leader when evolution stagnates causes premature convergence.The exibility of the Leader depends on the progress of evolution: the less exible the model, the more deterministic the moves, and the less evolution can progress.In the same vein, rejecting the Repoussoir can cause oscillations around the optimum.The Repoussoir pushes the individual in the right direction as long as the individual is "between" the Repoussoir and the optimum.If the individual passes the optimum, rejecting the Repoussoir will draw the individual away from the optimum | until the Repoussoir is behind again.
The direction might also be irrelevant at the population level.This happens when the population is symmetrical with respect to the model.Any move based on the model (either to imitate or to reject it) in fact exchanges the individuals without much modifying the population.
For instance, assume the population belongs to the two schemas 01 :: and 10 ::.Assume that M re ects this symmetry, with M 1 M 2 :5, and assume further that at least two bits must be modi ed (see section 3.4).If the strategy is to reject M, bits 1 and 2 will not be modi ed | and the o spring still belong to the same schema as it parent.But, if the strategy is to imitate M, bits 1 and 2 are both modi ed | and the o spring again belong to the initial schemas: the model will remain symmetrical, until the symmetry is broken by some external factor.
But all such deadlocks and cycles are less likely to occur, if two models are considered: the in uence of each model acts as a perturbation with regard to the other, and enhances the diversity of the o spring when the other model gets stuck.In other words, the more models, the more speci c they can harmlessly be.

Updating the models
From these considerations, we construct rather speci c models: the Leader model L is constructed by relaxation from the best o spring in the current population and the Repoussoir model R is constructed from the worst two o spring in the current population.
Relaxation is commonly used (e.g. in neural nets) to ensure smooth updates and prevent numerical oscillations.It reads: where M is the model, 2 0; 1] is the relaxation factor, or memory fading; M is computed from the current state of the system, de ning the selectivity of the memory.
Table 1 illustrates on a simple 5-bits example how L and R are respectively updated from the best and the two worst o spring.
1 2 3 4 5 Fitness X 1 1 0 0 0 high S 0 0 0 1 0 low T 1 0 1 1 1 low dR 0.5 0 0.5 1 0.5 The same relaxation factor is used to update both the Leader and the Repoussoir, though the worst o spring are likely more diversi ed than the best one.However, complementary experiments (not reported in the paper) show that using di erent relaxation factors for the Leader and the Repoussoir does not make any signi cant di erence.

Mimetic mutation
Let X and m thereafter denote the current individual and the number of bits to mutate in X (details on the setting of m in section 3.4).

Social strategies based on one model
Assume rst that X is evolved from a single model M.
For each bit i, the probability p i of mutating bit i depends on whether X i is close to to M i , and whether M is considered positively or negatively.As already stated, if M is desirable, p i increases with jX i ?M i j: the more X i di ers from M i , the more X i should be modi ed.If, on the opposite, M is not desirable, p i decreases as jX i ?M i j increases.
In preliminary experiments with a no-memory setting ( = 1) 47], we used a roulette wheel to select the bits to mutate, with p i being proportional to jX i ?M i j.But the roulette wheel selection shows ine cient when M actually memorizes several generations of worst o spring ( < 1), and for large sized problems: after a few hundreds of generations, most M i are close to 0 or 1 up to 10 ?5 .In such a context, no general and accurate way to compute p i from jX i ?M i j, was found.
We therefore use a selection by tournament among bits: each bit to mutate is selected as the bit i j optimizing jX i ?M i j among T bits drawn with uniform probability in f1 : : : Ng.By the way, the same mechanism can be used to nd o spring imitating M (by mutation of bits maximizing jX i ?M i j) or rejecting M (by mutation of bits minimizing jX i ?M i j).
for each l = 1 .. m // number of bits to mutate Select T bits i 1 : : : i T uniformly in 1...N Mutate X i j with i j = Arg optimum fjX i k ?R i k j; i k = i 1 : : : i T g c

Social strategies based on two models
Consider now the two models constructed by evolution, the Repoussoir and the Leader.A most natural social strategy, referred to as the Sheep strategy, is to imitate the Leader and reject the Repoussoir.This strategy can be accommodated within the above tournament selection, now based on the maximization of jX i ?L i j ?jX i ?R i j.
But, more generally, one can choose to imitate, reject, or even ignore independently each model.
The selection of the bits to mutate then again proceeds by tournament, now optimizing criterion: L jX i ?L i j + R jX i ?R i j where R (respectively L ) indicates whether X is to imitate, reject or ignore R (respectively L): M > 0 means that X imitates M; M < 0 means that X rejects M; M = 0 means that X ignores M; A social strategy can thus be represented as a pair of coe cients ( R ; L ).A particular case, termed Ignorant strategy, ignores both models ( R = L = 0).For all other strategies, the parameters ( R ; L ) can be normalized by requiring 2 R + 2 L = 1, without modifying the tournament-based selection.Every strategy except the Ignorant one, is thereafter represented as angle in 0; 2 ], with ( R ; L ) = (cos ; sin ).
Figure 1 represents all social strategies but the Ignorant on the unit circle.By convention, angle 0 is associated to ( R = 1; L = 0), that is, imitating the Repoussoir and ignoring the Leader.Angle =2 corresponds to ( R = 0; L = 1), that is, imitating the Leader and ignoring the Repoussoir.Some social strategies have been given nicknames for the sake of convenience.We distinguish mainly: The Entrepreneur imitating the Leader and ignoring the Repoussoir (angle =2); The Sheep imitating the Leader and rejecting the Repoussoir (angle 3 =4); The Phobic rejecting the Repoussoir and ignoring the Leader; mimetic evolution with a phobic strategy just reproduces Evolution by Inhibitions (angle ); The Lone Rider rejecting both the Leader and the Repoussoir (angle -3 =4); The Rebel rejecting the Leader and ignoring the Repoussoir (angle -=2).The Leader and the Repoussoir together with the current individual, can be thought of as a system of coordinates.A social strategy is a direction in this system of coordinates: the corresponding angle constrains the trajectory of the individual, intended as its possible o spring.The constraint can be made more or less severe, making the trajectory more or less deterministic, by tuning the tournament size T; this parameter controls the variance of mimetic mutation, i.e. the predictability of the trajectory in a given situation.The ignorant strategy, setting no constraint on the o spring, serves as reference to determine the relevance of the other strategies.
In the remainder of the paper, we restrict ourselves to considering a single social strategy for all individuals, xed along evolution and with xed tournament size T.At rst sight, making all the population follow a single xed direction, with the same degree of predictability, dictates the system a poor evolution.Still, this direction is de ned with respect to the individual and the system of coordinates, which itself evolves together with the population.

Single-Model Dynamics
Consider rst a single model M, and let M denote the boolean individual closest to M. Let S be the set of bits candidate to mutation in the current individual X.The dynamics of mutation depends on how S varies using the feedback provided by the model.When X imitates M, two kinds of bits are preferably mutated: the bits which discriminate X from M (X i 6 = M i ), and the most general bits in M(M i close to :5).The discriminant bits were good candidates to be mutated in the previous steps; this is only possible if M i was relatively general.Inversely, M i can be general only if individuals with X i 6 = M i were recently considered to update M. In summary, the more a bit was recently mutated, the more it is mutated.
The set S of bits that are candidates to mutation does not depend on the individual, but rather converges to a xed set, making mimetic mutation unable to explore the whole search space (violating the ergodicity requirement 35,40]).
If M stands for the Leader (Entrepreneur strategy), the population converges toward M.
Incidentally, mimetic mutation here resembles the BSC operator of Syswerda 50]: but BSC actually uses the average of the current population to evolve the individuals, instead of the memory L. If M stands for the Repoussoir, the loss of genetic diversity does not occur as the population cannot converge toward M; rather, the strategy oscillates between optima closest to M.
When X rejects M, the bits preferably mutated are such that X i = M i , and M i is as speci c (close to 0 or 1) as possible.Thereby, mimetic mutation modi es all bits which have not been recently modi ed, neither in the individual nor in the model.In any case, the set S of bits candidates to mutation is much larger than in the imitation strategy: almost all bits M i get more speci c in each generation (all bits except at most 2 m, that is, the maximum number of individuals used to update M times the maximum number of bits modi ed in each individual), and the more speci c a bit, the more likely it is mutated.
There is one only case where S converges: when M recommends a move and the resulting o spring are considered neither to update the model nor the population 7 .As the mechanism is then deprived from any feedback, it can only persist in its wrong decision.
The reject strategy enforces the genetic diversity of the population, for the following reason.A bit which discriminates some individuals of the population is unlikely to be mutated: recent moves on this bit have been registrated; hence the corresponding M i is more general than for the other bits.
The weakness of the reject strategy thus comes from the fact that it prevents the recombination of current individuals and the convergence of the population.

Two-Model Dynamics
When both models are considered, they altogether provide any desired feedback on the previous moves: unsuccessful moves are memorized within the Repoussoir, successful moves are memorized within the Leader and the population.Three categories of bits might then be roughly identi ed, and the success of evolution relies on a su cient turn-over among these categories: To preserve exploration, one must be able to mutate bits which have not been recently modi ed.Such bits are characterized by both L i and R i speci c.Mutating these bits implies that either one or both models must be rejected (avoid the strategies in the quadrant 0; =2]).Typically, no bit should stay in that category for too long!
The bits which have been recently modi ed successfully (i.e. the resulting o spring have been kept in the population) are characterized by both L i and R i general.Mutating such bits is desirable as far as recombining the individuals is desirable, i.e. when the building blocks hypothesis holds.In this case, the social strategy must incorporate some imitation of the Leader (prefer the quadrant =2; ]).
The bits which have been recently but unsuccessfully modi ed (the o spring have not survived), are characterized by L i speci c and R i relatively general.The decision of mutating these bits depends on the gap between the current optimum and the next basin of attraction.
If there are large gaps between local optima, some obstinacy is required: one must be able to consider again bits which have been recently unsuccessfully modi ed.Practically, the social strategy must to some extent reject the Leader (prefer the quadrant ; 3 =2]).
Otherwise, a shallow and hopefully faster exploration can be achieved by discarding the bits which have been recently unsuccessfully modi ed (quadrant =2; ]).
Obviously, there exists nothing like a universal strategy; each strategy could prove the most adequate in some context.Furthermore, the social strategy is not the only factor ruling out the balance exploitation/exploration achieved by mimetic mutation: While mimetic mutation primarily concentrates on the choice of the bits to mutate, another key factor is the choice of the number of bits mutated in each individual.

Mimetic strength
Binary mutation traditionally uses a very low mutation rate 17], though good results obtained with high-rate mutation have also been reported 27,32].However, as the only operator of mimetic evolution is mutation, the mutation rate must certainly be high.Besides, the mutation rate governs the balance between exploration/exploitation achieved by mimetic mutation: exploration is encouraged for high mutation rates and exploitation for low mutation rates.Typically, mutating one only bit in any individual will cause evolution to get trapped in the nearest local optimum, though it is theoretically proved to be faster on unimodal problems 15].
In binary mutation, the probability of mutation is most usually constant (especially when it is very low).However, it can also be either dynamically adjusted, or adapted at the population level, or self-adapted at the individual level.Based on the di erent adaptation techniques used for mutation in evolutionary parameter optimization (again, see 22] for a survey), we have investigated the following heuristics to adjust m t , the mutation rate at generation t: Constant scheme: m t is set to a constant value.The adjustment of m can be done by considering mimetic mutations with di erent values of m as di erent exclusive operators; their probability can be adjusted them a la Davis 8] by rewarding the values leading to good o spring.
Hyperbolic scheme: m t decreases from an initial value m 0 to 1, according to the hyperbolic schedule borrowed from 3].The value for m t for all individuals is where T is the maximum number of generations and t denotes the current generation.
1/5th rule: Following 38], m t is geometrically increased (resp.decreased) by a factor 1:2 if the number of o spring more t than their parent in the last 10 generations is greater (resp.smaller) than 1=5.
Self-adaptive scheme: m is encoded in the individual and evolved according to the Obalek's rule described in 2]: p N In the three latter cases, the continuous value of m t was transformed into an integer value (as the tournament-based mechanism of mutation requires to know in advance the number of bits to mutate), either by taking its integer part, or by selecting an integer value m from a Poisson distribution of parameter m t 48] (it is well known that the binomial distribution B(N; m) tends to the Poisson distribution P( ) if Nm goes to as N goes to in nity).In all cases, m is lower-bounded by 1 to guarantee that mimetic mutation is e ective.For j = 1..., T select k j randomly in f1; : : : Ng p(k j ) = cos( ) jX k j ?R k j j + sin( ) jX k j ?L k j j Return k j = argmax f p(k 1 ) : : : p(k T ) g

Goal of experiments
As no optimization scheme could possibly dominate all other schemes 54], any new evolution algorithm should be presented together with its "niche".
Mimetic evolution was designed for large-sized problems, for the following reason.It is based on a structure of control, ordering the possible moves of the individuals.As any control, this entails some computational overhead that can only be balanced if selecting the moves at random is very often unsuccessful.This occurs i the size of the space is large enough; otherwise, exploring the neighborhood of the individuals at random might do as well.
In this large-space context, we shall focus on the two main choices of mimetic evolution: how to adjust the mimetic strength and how to choose a social strategy.After early preliminary experiments, these two points are brie y discussed, and the experimentation goal is then de ned.

Mimetic Strength: Simple options are retained
Preliminary experiments showed that the on-line adjustment of the mimetic strength m t was not robust.Rather, all heuristics used (section 3.4) including the Davis-like approach, the 1/5th rule and self-adaptation, were found to fail: all rapidly lead to set m t = 1, causing the premature convergence of evolution.
In retrospect, this failure can be explained from the fact that on-line adjustment rewards options bringing the most improvements, rather than the most signi cant ones 9].The above heuristics thus show risk-adverse and favor the conservative option m t = 1:, which produces the most improvements on the whole.Favoring conservative options might still bring improvements in a continuous search space.But in binary search spaces, the strong causality principle is violated (there is nothing like a "very small" mutation), and conservative options simply are inactive.No wonder then that the convergence results of evolution strategies 45, 1] cannot be transposed.
Furthermore, xed-step mutation (mutating exactly m t bits for all individuals at time t) shows more e ective than variable-step mutation (mutating on the average m t bits in the population, distributed on the individuals according to a Poisson law).This is unexpected, as xed-step mutation severely restricts the distribution of the o spring from the current population.Traditionally, mutation must be able to make arbitrarily large steps 35], to prevent evolution from being trapped in some local minima.In the meanwhile, making large steps can be bene cial as short cuts can be discovered.
However, experiments demonstrate that xed-length mutation does not cause evolution to stop during the observations (limited to 200,000 evaluations) | this was unexpected.Everything happens as if interesting o spring can always be found at a given distance of some individual of the population!In the meanwhile, xed-step mutation appears faster than variable-step mutation: if it prevents from discovering short cuts, it also saves a lot of bad moves.
In the experiments, two possibilities have then been investigated.One is to set m t to a constant value chosen in f1; 3; 5; 7; 9g; the other is to adjust m t according to a hyperbolically decreasing schedule (section 3.4), starting from m 0 = N=2 and reaching m = 1 at the end of evolution.

Social Strategies and Signi cance of Models
Mimetic evolution will thus be experimented on large sized problems, with xed or hyperbolic mimetic strength, and compared in this context to reference genetic algorithms, evolution strategies, and PBIL.
Experiments are designed to answer the following questions: Q1 Relevance of the models.
A particular test is to compare mimetic evolution based on actual models, with what happens with void models.Whenever the Ignorant strategy (using no models, or void models) outperforms all other strategies, this means that the models hinder, more than guide, evolution.Q2 Robustness of the scheme.
Assuming that there exists a social strategy outperforming the ignorant strategy, the next question regards the robustness of mimetic evolution: does there exist a wide range of social strategies valid for a given problem; does there exist a social strategy valid for a range of problems; how do they compare to reference algorithms (canonical genetic algorithms, evolution strategies, PBIL).Q3 Optimal control of the scheme.
This question regards what is the optimal strategy for a given problem.Hopefully, problems having the same optimal strategy present other similar features.Hence the optimal social strategy for a problem could then be used as a di culty criterion.

Experiments
We rst describe the problems considered and the reference algorithms.The experiment setups are then detailed.A global overview of the results over all functions is presented and the general trends are discussed.The section ends with some conclusions on the "niche" of Mimetic Evolution.

Problems
All experimentations consider the optimization of functions of 100 continuous variables, discretized through a binary or a Gray coding.Function F 2 is taken from 4].The Griewank, Rosenbrook and Rastrigin functions have been thoroughly studied in the literature up to 20 or 50 continuous variables 55].
Search space f0; 1g 1400 .The domain of each continuous variable x i is set to ?5:12; 5:12]; x i is coded on 14 bits.
The optimum is 0, reached for x i = 0. Rastrigin(x 1 ; : : : x 100 ) = 100 X i=1 x 2 i + 10(1 ?cos(2 x i ))] In the following, the Gray and binary encodings of a same function will be considered as di erent optimization problems, termed e.g.F 2 -binary and F 2 -Gray.

Reference algorithms
The following algorithms will be considered to give the reference results to which the results of mimetic evolution will be compared.
SGA.The rst reference algorithm is a simple GA.Several setups have been considered (see below).However, these experiments were rather control experiments, as in the range of setups considered, SGA was consistently outperformed by PBIL { as reported in 4].
PBIL.The second reference algorithm is Baluja's PBIL, with same setup as in 4]: population size 100, update of the model based on the two best o spring with relaxation factor , with possible reinforcement from the worst o spring with relaxation factor =2.
Parameter was varied as detailed below.
ES.The third reference algorithm is a ( + ) evolution strategy, where the mutation probability per bit follows a hyperbolic schedule with initial value 1/2 and nal value 1/N.This deterministic schedule was chosen after 3], which concludes that the hyperbolic schedule seems more e cient and more robust than both the self-adaptive scheme described in section 3.4, and the xed mutation rate of 1  N .The population size and natality are varied as detailed below.
Ignorant.The last reference algorithm is a ( + ) evolution strategy, where the number m of bits mutated in each individual is constant.This actually corresponds to a mimetic evolution following the Ignorant strategy (i.e.without any memory involved).Parameter m is varied as in other mimetic evolution schemes.
The results of both ES and the Ignorant strategy will be examined together as these reference algorithms only di er by the setting of their mutation rate.

Experimental settings
Each run is allowed an arbitrary number of 200,000 evaluations of the tness function (in order to compare with the results of 4] in the rst place).
Two steps of experiments are then performed: First, few runs (11) of a large numbers of parameter settings are launched.This allows us to evaluate and discuss the general trends (section 5.4).The best settings of all algorithms are then retained, and are studied in more detail (section 5.5).

The range of setups considered is as follows:
For SGA, the population size is set to 50 or 100, with uniform or 2-point crossover, at rate 0.5, 0.75 or 1.0, and mutation rate of 1 N , 3 N or 5 N .As SGA is outperformed by the other reference algorithms, its results will not be discussed any further.
For the Ignorant and Mimetic Evolution (section 3.3), the mimetic strength m takes values 1; 3; 5; or 7; the relaxation factor of the models is set to .01 and the tournament size T is set to 50 to keep reasonable the number of options.The strategy of Mimetic Evolution is varied in 0, 45,90,135,180,225,270, 315 (section 3.3); however, only the Sheep, the Phobic and the Lone Rider strategies (respectively corresponding to angles 135, 180 and 225) gave any good results and will be mentioned in the detailed study.

Global Overview
This section presents and discusses the o ine performance of all algorithms, from the results of 11 runs of each of the settings described in the preceding section.The six algorithms are SGA, PBIL, ES + Ignorant, (the reference algorithms), and the Sheep, the Phobic, and the LoneRider mimetic strategies.
In the following plots (Figures 2 to 5), each dot represents the o -line performance of a single run, i.e. the best tness reached after 200000 tness evaluations.The X-axis is simply the rank of the run among all runs for that algorithm, whatever the parameter settings (only the 100 best runs are shown, for readability reasons).Each curve shows how the scheme behaves at its best (beginning of the curve), and the sensitivity of the performance to the setting (the slope).
Binary and Gray coding of a same function obviously result in quite di erent landscapes, which may either hinder or favor evolution 35,52].On three out of four functions (F 2 , Griewank and Rosenbrook) the Gray coding shows more suited to evolutionary optimization than the binary coding.On the last one (Rastrigin), binary and Gray coding lead to similar results.strategies for both encodings.These strategies perform equally well, and consistently better than all other strategies.In the meanwhile, functions F 2 -binary and F 2 -Gray can be considered di cult, as the best results are still very far from the optimum (7 for F 2 -binary and 10 for F 2 -Gray vs 10 7 for the actual optimum).On the Griewank function, whatever the coding, the best performances are obtained for the Sheep strategy, the Ignorant strategy performing almost as good.Functions Griewank-binary and Griewank-Gray can be considered easy, as the optimum is almost reached (beware of the log-scale: 10 ?4 for Griewank-binary and 0 for Griewank-Gray vs 0).Typically, the best 40 runs for the Sheep and the best 15 runs for the Ignorant fall below the bottom line of the drawing.On the Rosenbrook function, the best scheme and the di culty depends on the coding.The Rosenbrook-binary problem is di cult: no algorithm gets to values lower than 100.The best strategies are the Sheep and the Phobic, that strike this barrier value many times.On the other hand, such a barrier does not appear on Rosenbrook-Gray, for which the best strategy is the Ignorant, that reaches values around 10 ?2 .Note that, apart from the 10 best runs, the Sheep performs almost equally well, and can even be considered more robust with respect to the parameter settings.The Rastrigin problem is di cult for both encodings: no algorithm gets to values lower than 100.However, the picture is di erent in both cases: On Rastrigin-binary, the best strategies are the Ignorant and the Lone Rider, closely followed by the Phobic.Again, the Ignorant strategy seems less robust with respect to the parameter settings.On Rastrigin-Gray, the best strategy is by far the Ignorant, followed by PBIL, whereas all mimetic strategies perform equally bad.

Rank of run
According to the respective performances of the schemes, one distinguishes three categories among the eight test problems considered here: Problems on which the Sheep strategy performs comparatively well.Such problems (Griewank-binary, Griewank-Gray, Rosenbrook-Gray) tend to be rather easy: the problem can be solved by iteratively memorizing the optimum and sampling its neighborhood.
Problems on which the Lone Rider strategy performs well.Such problems (F 2 -binary, F 2 -Gray and Rastrigin-binary) tend to be di cult: Indeed, following the Lone Rider amounts to somehow eeing the past optima; if this shows appropriate, the problem is in some sense deceptive.
Problems on which the Ignorant strategy performs well.Such problems (Rastrigin-binary or Rastrigin-Gray) are di cult too, but the way memory is used (e.g. by mimetic evolution) seems to rather mislead than guide evolution.

Detailed comparisons
The previous section gave a general idea of how the di erent schemes behave at their best.We now study in more detail the best settings of mimetic evolution and compare them with the best reference algorithms on each problem.Each evolution scheme is evaluated from the average or median best performance out of 21 runs.As recommended by 14], the median should be preferred to the average whenever the tnesses involved show very di erent orders of magnitude (e.g. when the results get close to the optimum at 0).The criterion used here was to present the average and standard deviations in general, and to use the median whenever the standard deviation was high (not signi cantly smaller than the average value).When the standard deviation was small, the average and the median results were close in all cases presented here anyway.

Function F 2
Table 2 below presents the o ine reference results for F 2 problems, while Figure 6 shows the median online evolution for ES and Ignorant algorithms.Table 3 and Figure 7 show the same results obtained by the mimetic algorithms.On this problem, the best reference algorithms are PBIL and the Ignorant.For PBIL, the result is very sensitive to .For the Ignorant, it is sensitive to the number m of bits to mutate.As expected, m = 1 leads to bad results (for m = 1, the Ignorant resembles a standard Hill-Climber).Best performances are obtained for m = 3 for both encodings.The performances do not depend much on the population size and natality .(m follows a hyperbolic schedule decreasing from N=2 to 1).After a bad start (mutation is too active in the early generations), the hyperbole curve soon catches up all other curves and passes them.But in the second half of evolution, the hyperbole curve gets stuck (almost horizontal), and is passed by some xed schemes.This is due to the fact that the actual strength of mutation is one.

PBIL
However, a clear e ect of the mimetic strategies is to increase the slope of all curves { and this is even clearer for the 1-bit and Hyperbole curves, which de nitely end horizontally for the ignorant strategy (Figure 6) while slowly but steadily increasing for the mimetic strategy (Figure 7).

Griewank Function
As in the preceding section, Table 4 and 5 below below presents the o ine results for Griewank problem, while Figure 8 and 9 show some plots of online median results.
On this problem, the best reference algorithms are ES and the Ignorant.For both codings, the lower m the better: surprisingly, best reference results are obtained for m = 1.The hyperbolic ES catches up the Ignorant with some delay.This, added to the fact that the Ignorant almost nds the optimum, suggests that Griewank-binary is actually rather easy.The performances of the Ignorant are rather sensitive to and .As already mentioned in the general overview, these problems are comparatively easy.On both problems, the Sheep gets the best results.On Griewank-binary, the Sheep is slightly behind the Ignorant and optimal results are obtained for high values of m (optimum at m = 7 for the Sheep on Griewank-binary).On both problems, too, the Lone Rider is one order of magnitude better than the Phobic -though slightly worse that the Sheep in the binary case.What is surprising is that the best results are obtained with a high m for the Sheep and for m = 1 for the Lone Rider.The fact that the Sheep can a ord much larger mutation steps than the Ignorant can be explained from the memory: intuitively, more (good) information allows to nd short-cuts in the tness landscape.On the other hand, the mandatory small steps of the Lone Rider might be explained from the balance between exploration and exploitation:

PBIL
Increasing the value of m favors exploration over exploitation.Moreover, using the Lone Rider instead of the Sheep similarly favors exploration ( eeing away instead of imitating the previous optima) over exploitation.Using the Lone Rider together with a large mutation step might result in too strongly bias toward exploration.
For all mimetic strategies, the performances are more sensitive to the population size and natality than for the Ignorant strategy.As in Figures 6 and 7 (function F 2 ), the hyperbolic curve shows higher slope that the xed schemes in the rst half of the plots of Figure 9. Afterward, it behaves as a 1-bit mimetic evolution (but the logarithmic scale makes the hovering less visible).In Figure 8, where the 1bit mimetic scheme is the best one, the hyperbolic scheme simply catches up that best behavior a little after 100000 evaluations.
Another striking fact in Figure 9 is the outstanding best value for the mutation strength (7 for Binary, 5 for Gray).However, complementary experiments show that no further improvement is brought by increasing the value of m.

Rosenbrook Function
Again, the results on Rosenbrook problems are presented in Table 6  On this problem, the best reference algorithm is the Ignorant.However, the di erent codings seem to shape very di erent tness landscapes.
For Rosenbrook-binary, the best performance is obtained for rather high values of m (m = 7) and optimization ends very far from the actual optimum (427 vs 0).
On the opposite, for Rosenbrook-Gray, the best performance is obtained for m = 1 (i.e. with a simple Hill-Climber), and optimization reaches 5:25.Also the best results of PBIL were obtained for = 0:5, i.e. with rapidly changing distribution.
This suggests that indeed the binary coding creates more local optima and di cult barriers (e.g. the so-called Hamming cli s) than Gray coding 52], requiring larger steps to overcome these di culties.The characteristic stair-like shapes of the plots of Figure 10 witnesses such sudden changes (whenever some cli is over-passed).As for Griewank problems, the Sheep appears the more suited mimetic strategy: it behaves well with both encodings.Moreover, the situation for the binary problem is quite similar to that of the Griewank binary problem: the Sheep outperforms all other strategies (including the Ignorant) when using 5-bits mutation, while the Lone Rider performs a little worse, but with 1-bit mutation only.
However, the Rosenbrook Gray problem shows a picture quite di erent from the Griewank Gray: All strategies get their best results with a 1-bit mutation, and are slightly outperformed by the Ignorant.It seems that no useful information can be obtained from the past evolution, no short can be found.Even worse, the memory might give false indications, resulting in worse results than the Ignorant strategy.
On Figure 11-a, the stair-like shape of the 7-bit curve, and the very poor performance of the 1-and 3-bits curves (not visible!) again suggests the existence of barriers in the tness landscape.In the same line, the Hyperbole switches from the behavior resembling the 7-bit curve to values close to those of the 5-bit curve before getting stationary as the mutation strength reaches smaller values.11-b, it can be seen that the 1-bit mutation needs some time before nding a quick way toward good values.On the other hand, the Hyperbole never nds such way down, probably trapped by its rst steps in some completely di erent region of the search space.

Rastrigin Function
For Rastrigin problems, only the o -line results are presented in Table 8 and 9. Indeed, the on-line behavior of all algorithms does not provide much useful information: almost the same comments than for function F 2 can be made.16) 172 ( 14) 162 ( 15) 162 ( 14) 211 ( 14) 144 ( 11) 138 ( 14) 140 ( 16) The unique characteristic of these results is that all reference schemes behave almost the same whatever the coding of the problem.Further, there is not much di erence between ES, the Ignorant and PBIL.over 21 runs of 200,000 evaluations.The best performances are obtained for the Phobic and the Lone Rider, both with rather high values of m (m = 5 or 7 for Rastrigin-binary, and m = 7 for Rastrigin-Gray.Nevertheless, the mimetic strategies give results similar to the Ignorant strategy for the binary case, and are slightly outperformed on the Gray problem { though no algorithm gets any close to the global optimum.

The niche of Mimetic Evolution
As a summary of the results of the previous section, consider Table 10 below, ranking all strategies on each of the test problems.On Griewank-binary and Rosenbrook-Gray, the Ignorant with m = 1, i.e. an algorithm resembling a simple Hill-Climber, is the best option and falls close to the optimum.Clearly, mimetic evolution brings no de nite advantage as a memory-less evolution can do the job.These problems should be excluded from the scope of mimetic evolution, as too \easy".It is most interesting, incidentally, that mutating a xed number of bits in any individual appears more e cient than using a probability of mutation p i per bit.This can be explained as, when probability p i is low (around 1=N), mutation happens to be very frequently inactive 8 .
On Rastrigin-Gray, the Ignorant with m = 5 shows the best option, though it falls far from the optimum.Such functions can be considered symmetrically as too di cult for Mimetic evolution: either mimetic evolution fails to construct a relevant memory, or it does not use the models in a proper way.Indeed, the use of models o ers room for improvement, and some perspectives of further work on this point will be discussed in the next section.
On the other problems, mimetic evolution brings some improvement over the reference algorithms, with two di erent settings: the Sheep with rather high values of m (e.g.m = 5 for Griewank-Gray and Rosenbrook-binary), and the Lone Rider or the Phobic with moderate values of m (m = 3 for F 2 -binary and F 2 -Gray, m = 5 for Rastrigin-binary).
In the latter case, the fact that the Lone Rider and the Phobic strategies are the best ones gives some hints into the structure of the tness landscape: either recombining the individuals is not relevant (e.g.macro-mutation would be more appropriate than crossover 27]); or maintaining the diversity of the population is more important than recombination.
In the former case, the fact that the Sheep strategy outperforms the other ones symmetrically implies that a fast recombination-diversi cation of the individuals is relevant.This could be con rmed by the fact that large gaps can be viewed in the performances (Figure 3): the landscape is composed of local optima with large basins of attraction.The population climbs toward the local optimum, then waits for nding the good direction toward another basin of attraction.
Most surprisingly, mutating a constant number of bits shows su cient to reach good performances in many cases, which implies that bounded-mutation is su cient to escape many local optima.This contradicts the intuition that mutation must be able to make very large steps, even with low probability, in order to prevent evolution from premature convergence 9 But the fact that many basins of attraction can be escaped with a small jump (0.5% of the total number of bits) might come from the high dimension of the problems considered.Indeed, the size of the neighborhood grows exponentially with the dimension of the space, which modi es the rarity of local optima 51].

Conclusion and Perspectives
This paper is concerned with the possible forms and uses of memory in arti cial evolution, and focuses on the role of explicit common memory.In the line of PBIL 5] and Evolution by Inhibition 48], we investigate through the mimetic evolution scheme, how to use the memory of the past best and worst individuals generated in the previous generations.
These memories can be interpreted in various ways.They can rst be considered as distributions on the search space.A restricted distribution space was considered here, i.e. the variables are assumed locally independent.But richer distribution space can be considered: This mech-anism was recently extended to Genetic Programming 42], continuous optimization 46], and combinatorial optimization 6].
The memories can also be viewed as a self-adaptive system of coordinates in the search space, varying along evolution.The goal is to de ne the relevant direction (i.e.some gradient information) with respect to this system of coordinates, or mimetic strategy.Only xed strategies were investigated in this paper (section 3. 3).An open issue is to automatically adjust the relevant strategy along evolution, either at the population level, or at the individual level.Still, on the restricted set of functions studied in this paper, only three strategies are worth considering.Further, they appear useful in di erent situations: when diversity is important for the Phobic and Lone Rider strategies; when evolution has to jump many times from an optimum to another for the Sheep strategy { coupled with large mutation steps in that case.
Indeed, dealing with a direction rather than with a distribution raises the additional question of how to set the strength of the mutation (the size of the mutation steps m).Only xed and hyperbolic (decreasing m from N/2 to 1) settings have been considered so far.However, some hints are given by the experimental results: First, mutating a xed number of bits m per individual should be preferred to mutating all bits with a xed probability per bit { provided the optimum value for m is found, as in many case there is a clear optimum value (Figures 6 to 9); Second, a decreasing schedule might prove bene cial { though the hyperbolic scheme proposed in 3], decreasing from N=2 to 1 should be ne-tuned: if 1-bit mutation is not the optimum value, mutating one bit during the last half of the run is useless.Ongoing research is concerned with investigating other simple schedules for adjusting m, using either an hyperbolic decrease toward some m end > 1, or an online adaptation mechanism.
Of course the issue of determining the optimal values for the mutation strength remains open, and at the moment still relies on extensive numerical experiments for a large number of settings of mutation schedules and mimetic strategies.But on the other hand, such experiments might help to understand the very nature of the tness landscape at hand, and an unexpected achievement might be a new background to assess problem di culties { independently of how well mimetic evolution A last perspective of research is concerned with extending mimetic evolution to multiobjective optimization.The objective would be to construct models sampling the Pareto front.In that context, the control of evolution shifts toward how to determine the natality of each model, and how to update the models from the current population (e.g., should a non-dominated individual be used to update any model?how to determine match individuals and models . . .).
More generally, when considering an explicit memory, evolution shifts from the phenotypegenotype paradigm 30] to the paradigm of distribution of phenotypes/distribution of genotypes.This new search space is always larger, thus such approach should indeed be more powerful.But only if the following open problems can be solved: nd adequate evolution operators in a distribution space; characterize the bene t of distribution-based evolution, i.e. the class of functions relevant to a memory-based approach.
Figure 1.Social Strategies Mimetic mutation is embedded into a ( + ) evolutionstrategy 45].Besides the method used to set m t , mimetic evolution includes ve parameters:The population size and birth rate in IN The relaxation factor used to update L and R : : X in f0; 1g N Compute F(X i ), i = 1 : : : N. Set L i = R i = :5, i = 1 : : : N Repeat for each X i = X 1 : : : X for each j = 1 : : : = Offspring = Mimetic Mutation(X i ; L; R)Compute Fitness(Offspring)End whileSort parents + offspringUpdate the models dL = best of f parents + offspringg L = (1 ?):L + :dL dR = average of the two worst f parents + offspringg R returns the number of bits to mutate k i = Tournament(X; L; R) Y k i = 1 ?

Figure 2
Figure 2 Maximization of function F 2 , general overview On function F 2 , the best performances are obtained for the Phobic and the Lone Rider mimetic

Figure 3
Figure 3 Minimization of Griewank function, general overview

Figure 4
Figure 4 Minimization of Rosenbrook function, general overview

Figure 5
Figure 5 Minimization of Rastrigin function, general overview

Figure 6 Figure 7 2 Figures 6
Figure 6 ES/Ignorant on function F 2The best mimetic strategies are the Lone Rider and the Phobic, which behave equally well on both problems, as in the general overview.The results similarly depend on m, with best results obtained for m = 5 (with binary coding) and m = 3 (with Gray coding).The overall performance is more sensitive to the population size and natality and than for the Ignorant strategy.

Table 1 :
Individuals and Models

Table 2 :
Reference results for F 2 ; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations.

Table 3 :
Mimetic results for function F 2 ; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations.

Table 4 :
Reference results for Griewank function; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations, except for cases (*) where the averages and standard deviations are non-signi cant; the gure is then the median of the 21 runs.All gures have been multiplied by 100.

Table 5 :
Mimetic results for Griewank function; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations, except for cases (*) where the averages and standard deviations are non-signi cant; the gure is then the median of the 21 runs.All gures have been multiplied by 100.

Table 6 :
and 7 below for the o ine results, and in Figure10and 11 for some sample median online plots.Reference results for Rosenbrook function; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations, except (*) where the very di erent orders of magnitude of the tnesses makes again averages and standard deviations non-signi cant: the given gure is then the median of the 21 runs.Also note the range of parameter for PBIL di ers from all other similar tables.

Table 7 :
Mimetic results for Rosenbrook function; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations, except (*) where the very di erent orders of magnitude of the tnesses makes again averages and standard deviations non-signi cant: the given gure is

Table 8 :
Reference results for Rastrigin function; Average o ine results (standard deviation) over 21 runs of 200,000 evaluations.

Table 9 :
Mimetic results for the Rastrigin function; Average o ine results (standard deviation)

Table 10 :
Rank of the di erent mimetic strategies over all problems.