[. Akimoto, A. Auger, and N. Hansen, Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic $\mathcal C^2$ -composite Functions, Lecture Notes in Computer Science, vol.7491, issue.1, pp.42-51, 2012.
DOI : 10.1007/978-3-642-32937-1_5

G. [. Ackley, T. J. Hinton, and . Sejnowski, A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985.
DOI : 10.1207/s15516709cog0901_7

[. Amari, Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998.
DOI : 10.1103/PhysRevLett.76.2188

[. Amari and H. Nagaoka, Methods of information geometry, volume 191 of Translations of Mathematical Monographs, 2000.

[. Akimoto, Y. Nagata, I. Ono, and S. Kobayashi, Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies, Proceedings of Parallel Problem Solving from Nature -PPSN XI, pp.154-163, 2010.
DOI : 10.1007/978-3-642-15844-5_16

P. Ravi, D. O. Agarwal, and . Regan, An Introduction to Ordinary Differential Equations, 2008.

]. D. Arn06 and . Arnold, Weighted multirecombination evolution strategies . Theoretical computer science, pp.18-37, 2006.

S. Baluja, Population based incremental learning: A method for integrating genetic search based function optimization and competitve learning, 1994.

[. Baluja and R. Caruana, Removing the Genetics from the Standard Genetic Algorithm, Proceedings of ICML'95, pp.38-46, 1995.
DOI : 10.1016/B978-1-55860-377-6.50014-1

[. Bengio, A. C. Courville, and P. Vincent, Unsupervised feature learning and deep learning: A review and new perspectives, 1206.

]. A. Ber00a and . Berny, An adaptive scheme for real function optimization acting as a selection operator, Combinations of Evolutionary Computation and Neural Networks, pp.140-149, 2000.

]. A. Ber00b and . Berny, Selection and reinforcement learning for combinatorial optimization, Parallel Problem Solving from Nature PPSN VI, pp.601-610, 1917.

A. Berny, Boltzmann machine for population-based incremental learning, ECAI, pp.198-202, 2002.

]. Bey01 and . Beyer, The Theory of Evolution Strategies. Natural Computing Series, 2001.

[. Billingsley, Probability and measure Wiley Series in Probability and Mathematical Statistics, 1995.

P. [. Bengio, V. Lamblin, H. Popovici, and . Larochelle, Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems 19, pp.153-160, 2007.

C. [. Branke, J. L. Lode, and . Shapiro, Addressing sampling errors and diversity loss in UMDA, Proceedings of the 9th annual conference on Genetic and evolutionary computation , GECCO '07, pp.508-515, 2007.
DOI : 10.1145/1276958.1277068

H. [. Beyer and . Schwefel, Evolution strategies?a comprehensive introduction, Natural Computing, vol.1, issue.1, pp.3-52, 2002.
DOI : 10.1023/A:1015059928466

J. Burbea, Informative geometry of probability spaces, Exposition . Math, vol.4, issue.4, pp.347-378, 1986.

M. Thomas, J. A. Cover, and . Thomas, Elements of information theory, 2006.

D. P. Pieter-tjerk-de-boer, S. Kroese, R. Y. Mannor, and . Rubinstein, A Tutorial on the Cross-Entropy Method, Annals of Operations Research, vol.16, issue.3, pp.19-67, 2005.
DOI : 10.1007/s10479-005-5724-z

]. G. Dcb-+-10, A. Desjardins, Y. Courville, P. Bengio, O. Vincent et al., Parallel tempering for training of restricted Boltzmann machines, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.

D. Elizabeth, J. J. Dolan, and . Moré, Benchmarking optimization software with performance profiles, Mathematical programming, vol.91, issue.2, pp.201-213, 2002.

[. Das, S. Maity, B. Qu, and P. Suganthan, Real-parameter evolutionary multimodal optimization ??? A survey of the state-of-the-art, Swarm and Evolutionary Computation, vol.1, issue.2, pp.71-88, 2011.
DOI : 10.1016/j.swevo.2011.05.005

[. Gallagher and M. Frean, Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift, Evolutionary Computation, vol.12, issue.4, pp.29-42, 2005.
DOI : 10.1023/A:1013500812258

[. Ghahramani, Unsupervised Learning, Advanced Lectures on Machine Learning, pp.72-112, 2004.
DOI : 10.1080/01621459.1995.10476550

T. Tobias-glasmachers, Y. Schaul, D. Sun, J. Wierstra, and . Schmidhuber, Exponential natural evolution strategies, Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO '10, pp.393-400, 2010.
DOI : 10.1145/1830483.1830557

]. N. Han06a and . Hansen, An analysis of mutative ?-self-adaptation on linear fitness functions, Evolutionary Computation, vol.14, issue.3, pp.255-275, 2006.

]. N. Han06b and . Hansen, The CMA evolution strategy: a comparing review

J. A. In, P. Lozano, I. Larranaga, E. Inza, and . Bengoetxea, Towards a new evolutionary computation Advances on estimation of distribution algorithms, pp.75-102, 2006.

]. N. Han09 and . Hansen, Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed, Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, GECCO '09, pp.2389-2396, 2009.

]. G. Hin02, . Hintonhj61-]-r, T. A. Hooke, and . Jeeves, Training products of experts by minimizing contrastive divergence Direct search " solution of numerical and statistical problems, Neural Computation Journal of the ACM, vol.14, issue.8, pp.1771-1800212, 1961.

N. Hansen and S. Kern, Evaluating the CMA Evolution Strategy on Multimodal Test Functions, Parallel Problem Solving from Nature PPSN VIII, pp.282-291, 2004.
DOI : 10.1007/978-3-540-30217-9_29

S. [. Hansen, P. Müller, and . Koumoutsakos, Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES), Evolutionary Computation, vol.11, issue.1, pp.1-18, 2003.
DOI : 10.1162/106365601750190398

A. [. Hansen and . Ostermeier, Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation [HO01] Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolution strategies, ICEC96, pp.312-317159, 1996.

S. [. Hinton, Y. Osindero, and . Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

D. [. Jastrebski and . Arnold, Improving Evolution Strategies through Active Covariance Matrix Adaptation, 2006 IEEE International Conference on Evolutionary Computation, pp.2814-2821, 2006.
DOI : 10.1109/CEC.2006.1688662

M. Jebalia and A. Auger, Log-Linear Convergence of the Scale-Invariant (??/?? w ,??)-ES and Optimal ?? for Intermediate Recombination for Large Population Sizes, Parallel Problem Solving from Nature (PPSN XI), pp.52-61, 2010.
DOI : 10.1007/978-3-642-15844-5_6

URL : https://hal.archives-ouvertes.fr/inria-00494478

H. Jeffreys, An Invariant Form for the Prior Probability in Estimation Problems, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.186, issue.1007, pp.453-461, 1946.
DOI : 10.1098/rspa.1946.0056

E. Peter, E. Kloeden, and . Platen, Numerical solution of stochastic differential equations, Applications of Mathematics, vol.23, 1992.

[. Kullback, Information theory and statistics, 1968.

J. [. Larranaga and . Lozano, Estimation of distribution algorithms: A new tool for evolutionary computation, 2002.
DOI : 10.1007/978-1-4615-1539-5

[. Roux, P. Manzagol, and Y. Bengio, Topmoumoute online natural gradient algorithm, NIPS, 2007.

J. J. Moré, B. S. Garbow, and K. E. Hillstrom, Testing Unconstrained Optimization Software, ACM Transactions on Mathematical Software, vol.7, issue.1, pp.17-41, 1981.
DOI : 10.1145/355934.355936

L. Malagò, M. Matteucci, and G. Pistone, Towards the geometry of estimation of distribution algorithms based on the exponential family, Proceedings of the 11th workshop proceedings on Foundations of genetic algorithms, FOGA '11, pp.230-242, 2011.
DOI : 10.1145/1967654.1967675

L. Malagò, M. Matteucci, and B. D. Seno, An information geometry perspective on estimation of distribution algorithms, Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, GECCO '08, pp.2081-2088, 2008.
DOI : 10.1145/1388969.1389026

J. Ashworth, N. , and R. Mead, A simplex method for function minimization, The Computer Journal, pp.308-313, 1965.

D. [. Pelikan, F. G. Goldberg, and . Lobo, A survey of optimization by building and using probabilistic models, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334), pp.5-20, 2002.
DOI : 10.1109/ACC.2000.879173

[. Rao, Information and the Accuracy Attainable in the Estimation of Statistical Parameters, Bull. Calcutta Math. Soc, vol.37, pp.81-91, 1945.
DOI : 10.1007/978-1-4612-0919-5_16

[. Ros and N. Hansen, A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity, Proceedings of Parallel Problem Solving from Nature (PPSN X), pp.296-305, 2008.
DOI : 10.1007/978-3-540-87700-4_30

URL : https://hal.archives-ouvertes.fr/inria-00287367

D. [. Rubinstein and . Kroese, The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, 2004.

[. Rubinstein, The cross-entropy method for combinatorial and continuous optimization, Methodology And Computing In Applied Probability, vol.1, issue.2, pp.127-1901010091220143, 1023.
DOI : 10.1023/A:1010091220143

L. [. Silva and . Almeida, Acceleration techniques for the backpropagation algorithm [Sal09] Ruslan Salakhutdinov Learning in Markov random fields using tempered transitions, Advances in Neural Information Processing Systems 22, pp.110-119, 1990.

L. Schwartz, Analyse. II, volume 43 of Collection Enseignement des Sciences [Collection: The Teaching of Science], Calcul différentiel et équations différentielles, 1992.
URL : https://hal.archives-ouvertes.fr/tel-00308504

[. Schaul, T. Glasmachers, and J. Schmidhuber, High dimensions and heavy tails for natural evolution strategies, Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO '11, pp.845-852, 2011.
DOI : 10.1145/2001576.2001692

[. Suttorp, N. Hansen, and C. Igel, Efficient covariance matrix update for variable metric evolution strategies, Machine Learning, pp.167-197, 2009.
DOI : 10.1007/s10994-009-5102-1

URL : https://hal.archives-ouvertes.fr/inria-00369468

[. Sareni and L. Krähenbühl, Fitness sharing and niching methods revisited, IEEE Transactions on Evolutionary Computation, vol.2, issue.3, pp.97-106, 1998.
DOI : 10.1109/4235.735432

URL : https://hal.archives-ouvertes.fr/hal-00359799

[. Salakhutdinov and I. Murray, On the quantitative analysis of deep belief networks, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.872-879, 2008.
DOI : 10.1145/1390156.1390266

]. P. Smo86 and . Smolensky, Information processing in dynamical systems: foundations of harmony theory, Parallel Distributed Processing, pp.194-281, 1986.

[. Sun, D. Wierstra, T. Schaul, and J. Schmidhuber, Efficient natural evolution strategies, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO '09, pp.539-546, 2009.
DOI : 10.1145/1569901.1569976

URL : http://arxiv.org/abs/1209.5853

[. Torczon, On the Convergence of Pattern Search Algorithms, SIAM Journal on Optimization, vol.7, issue.1, pp.1-25, 1997.
DOI : 10.1137/S1052623493250780

]. M. Tou04 and . Toussaint, Notes on information geometry and evolutionary processes. eprint arXiv:nlin/0408040, 2004.

[. Wagner, A. Auger, and M. Schoenauer, EEDA : A new robust estimation of distribution algorithms, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00070802

]. D. Whi89 and . Whitley, The genitor algorithm and selection pressure: Why rank-based allocation of reproductive trials is best, Proceedings of the third international conference on Genetic algorithms, pp.116-121, 1989.

[. Wierstra, T. S. Peters, and J. Schmidhuber, Natural Evolution Strategies, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp.3381-3387, 2008.
DOI : 10.1109/CEC.2008.4631255

URL : http://arxiv.org/abs/1106.4487