Y. Bengio and Y. Lecun, Scaling learning algorithms towards AI, Large-Scale Kernel Machines, 2007.

Y. Bengio, O. Delalleau, and N. L. Roux, The curse of highly variable functions for local kernel machines, NIPS, 2005.

Y. Bengio, P. Lamblin, V. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, NIPS, 2007.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory, Parallel Distributed Processing, pp.194-281, 1986.

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985.
DOI : 10.1207/s15516709cog0901_7

M. Ranzato, Y. Boureau, S. Chopra, and Y. Lecun, A unified energy-based framework for unsupervised learning, AISTATS, 2007.

V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, ICML, 2010.

A. Krizhevsky, Convolutional deep belief networks on CIFAR-10, 2010.

G. E. Hinton, Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002.
DOI : 10.1162/089976600300015385

Y. Bengio and O. Delalleau, Justifying and Generalizing Contrastive Divergence, Neural Computation, vol.17, issue.6, pp.1601-1621, 2009.
DOI : 10.1145/1390156.1390290

A. Fischer and C. Igel, Training RBMs depending on the signs of the CD approximation of the log-likelihood derivatives, ESANN, 2011.

Y. Bengio, Learning Deep Architectures for AI, Machine Learning, pp.1-127, 2009.
DOI : 10.1561/2200000006

H. Bourlard and Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988.
DOI : 10.1121/1.395916

G. E. Hinton, Connectionist learning procedures, Artificial Intelligence, vol.40, issue.1-3, pp.185-234, 1989.
DOI : 10.1016/0004-3702(89)90049-0

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. Japkowicz, S. J. Hanson, and M. A. Gluck, Nonlinear Autoassociation Is Not Equivalent to PCA, Neural Computation, vol.12, issue.3, pp.531-545, 2000.
DOI : 10.1162/neco.1993.5.5.783

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390294

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273556

M. A. Ranzato, C. Poultney, S. Chopra, and Y. Lecun, Efficient learning of sparse representations with an energy-based model, NIPS, 2006.

M. Ranzato, Y. Boureau, and Y. Lecun, Sparse feature learning for deep belief networks, NIPS, 2008.

Y. Cho and L. Saul, Kernel methods for deep learning, NIPS, 2009.

B. Schölkopf, A. J. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998.
DOI : 10.1007/BF02281970

F. Yger, M. Berar, G. Gasso, and A. Rakotomamonjy, A supervised strategy for deep kernel machine, ESANN, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00668302

R. Rosipal, L. J. Trejo, and B. Matthews, Kernel PLS-SVC for linear and nonlinear classification, ICML, 2003.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Handwritten digit recognition with a back-propagation network, NIPS, 1990.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Lecun, What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459469

G. Desjardins and Y. Bengio, Empirical evaluation of convolutional RBMs for vision, 2008.

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553453

K. Kavukcuoglu, M. A. Ranzato, R. Fergus, and Y. Lecun, Learning invariant features through topographic filter maps, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206545

K. Kavukcuoglu, P. Sermanet, Y. Boureau, K. Gregor, M. Mathieu et al., Learning convolutional feature hierarchies for visual recognition, NIPS, 2010.

R. Hadsell, P. Sermanet, J. Ben, A. Erkan, M. Scoffier et al., Learning long-range vision for autonomous off-road driving, Journal of Field Robotics, vol.23, issue.9, pp.120-144, 2009.
DOI : 10.1002/rob.20276

H. Lee, Y. Largman, P. Pham, and A. Y. Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, NIPS, 2009.

R. Collobert and J. Weston, A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390177

R. Salakhutdinov and G. E. Hinton, Using deep belief nets to learn covariance kernels for gaussian processes, NIPS, 2008.

M. D. Zeiler, G. W. Taylor, N. F. Troje, and G. E. Hinton, Modeling pigeon behaviour using a conditional restricted Boltzmann machine, ESANN, 2009.

G. E. Hinton and R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.504-507, 2006.
DOI : 10.1126/science.1127647

R. Salakhutdinov and G. E. Hinton, Semantic hashing, International Journal of Approximate Reasoning, vol.50, issue.7, pp.969-978, 2009.
DOI : 10.1016/j.ijar.2008.11.006

A. Krizhevsky and G. E. Hinton, Using very deep autoencoders for content-based image retrieval, ESANN, 2011.

M. Ranzato, A. Krizhevsky, and G. E. Hinton, Factored 3-way restricted Boltzmann machines for modeling natural images, AISTATS, 2010.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res, 2010.

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning?, J. Mach. Learn. Res, vol.11, pp.625-660, 2010.

A. Saxe, P. W. Koh, Z. Chen, M. Bhand, B. Suresh et al., On random weights and unsupervised feature learning, NIPS WS8, 2010.

L. Arnold, H. Paugam-moisy, and M. Sebag, Unsupervised layer-wise model selection in deep neural networks, ECAI, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00488338