Y. Bengio, Learning Deep Architectures for AI, Machine Learning, pp.1-127, 2009.
DOI : 10.1561/2200000006
URL : http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

A. Bordes and J. Weston, Learning end-to-end goal-oriented dialog, 2016.

H. Bourlard and Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988.
DOI : 10.1109/MASSP.1987.1165576

K. Cho, B. Van-merrienboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint, 2014.

F. Denis, Pac learning from positive statistical queries Algorithmic Learning Theory, pp.112-126, 1998.
DOI : 10.1007/3-540-49730-7_9
URL : http://www.cmi.univ-mrs.fr/~fdenis/alt98.ps

F. Geli and L. Bing, Social media text classification under negative covariate shift, EMNLP, 2015.

W. H. Greene, Sample Selection Bias as a Specification Error: A Comment, Econometrica, vol.49, issue.3, pp.795-798, 1981.
DOI : 10.2307/1911523

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/pdf/1512.03385

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313504-507, 2006.
DOI : 10.1126/science.1127647

G. E. Hinton and R. S. , Autoencoders, minimum description length and helmholtz free energy, NIPS, 1994.

N. Japkowicz, C. Myers, and M. Gluck, A novelty detection approach to classification, IJCAI, 1995.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2015.

D. Kingma and M. Welling, Auto-encoding variational bayes, 2013.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

Y. Lecun, S. Chopra, R. Hadsell, M. Ranzato, and F. J. Huang, A tutorial on energy-based learning, 2006.

X. Li and L. Bing, Learning from Positive and Unlabeled Examples with Different Data Distributions, 2005.
DOI : 10.1007/11564096_24
URL : http://www.cs.uic.edu/~liub/publications/ECML-05.pdf

C. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin et al., How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
DOI : 10.18653/v1/D16-1230

J. Mcauley and J. Leskovec, Hidden factors and hidden topics, Proceedings of the 7th ACM conference on Recommender systems, RecSys '13, 2013.
DOI : 10.1145/2507157.2507163

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, 2002.
DOI : 10.3115/1073083.1073135

J. Pennington, R. Socher, and C. D. Manning, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1162
URL : http://nlp.stanford.edu/projects/glove/glove.pdf

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, 1988.
DOI : 10.1038/323533a0

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of Statistical Planning and Inference, vol.90, issue.2, pp.227-244, 2000.
DOI : 10.1016/S0378-3758(00)00115-4
URL : http://www.is.titech.ac.jp/~shimo/pub/Shimodaira JSPI2000.pdf

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint, 2014.

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390294
URL : http://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf

X. Li and B. Liu, Learning to classify text using positive and unlabeled data, IJCAI, 2003.

H. Yu, J. Han, and K. Chang, PEBL, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, 2002.
DOI : 10.1145/775047.775083

J. Zhao, M. Mathieu, and Y. Lecun, Energy-based generative adversarial networks, p.2017