D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz et al., Activity and emotion recognition to support early diagnosis of psychiatric diseases, Proc. of PervasiveHealth, pp.100-102, 2008.

R. A. Calvo and S. D. Mello, Frontiers of Affect-Aware Learning Technologies, IEEE Intelligent Systems, vol.27, issue.6, pp.86-89, 2012.
DOI : 10.1109/MIS.2012.110

B. Schuller, E. Marchi, S. Baron-cohen, A. Lassalle, and H. O. Reilly, Recent developments and results of asc-inclusion: An integrated internet-based environment for social inclusion of children with autism spectrum conditions, Proc. of IDGEI, 2015.

E. Marchi, F. Ringeval, and B. Schuller, 8. Voice-enabled assistive robots for handling autism spectrum conditions: an examination of the role of prosody, Speech and Automata in the Health Care, pp.207-236, 2014.
DOI : 10.1515/9781614515159.207

URL : https://hal.archives-ouvertes.fr/halshs-00481677

V. Petrushin, Emotion in speech: Recognition and application to call centers, Proc. of Artificial Neural Networks in Engineering, pp.7-10, 1999.

B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, and A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.552-557, 2009.
DOI : 10.1109/ASRU.2009.5372886

F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J. Thiran et al., Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data, Pattern Recognition Letters, vol.66, pp.22-30, 2015.
DOI : 10.1016/j.patrec.2014.11.007

D. A. Sauter, F. Eisner, P. Ekman, and S. K. Scott, Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations, Proc. of the National Academy of Sciences of the United States of America (PNAS), pp.2408-2412, 2009.
DOI : 10.1163/156856897X00357

Z. Zhang, E. Coutinho, J. Deng, and B. Schuller, Distributing Recognition in Computational Paralinguistics, IEEE Transactions on Affective Computing, vol.5, issue.4, pp.406-417, 2014.
DOI : 10.1109/TAFFC.2014.2359655

URL : http://mediatum.ub.tum.de/doc/1238172/document.pdf

B. Schuller, D. Arsi´carsi´c, F. Wallhoff, and G. , Emotion recognition in the noise applying large acoustic feature sets, Proc. of Speech Prosody, 2006.

A. Tawari and M. Trivedi, Speech emotion analysis in noisy realworld environment, Proc. of ICPR, pp.4605-4608, 2010.
DOI : 10.1109/icpr.2010.1132

C. Huang, G. Chen, H. Yu, Y. Bao, and L. Zhao, Abstract, Archives of Acoustics, vol.38, issue.4, pp.457-463, 2013.
DOI : 10.2478/aoa-2013-0054

URL : https://hal.archives-ouvertes.fr/hal-01418465

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008.
DOI : 10.1145/1390156.1390294

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.2238

X. Lu, Y. Tsao, S. Matsuda, and C. Hori, Speech enhancement based on deep denoising autoencoder, Proc. of INTER- SPEECH, pp.3444-3448, 2013.

B. Xia and C. Bao, Speech enhancement with weighted denoising auto-encoder, Proc. of INTERSPEECH, pp.436-440, 2013.

Y. Tan, J. Wang, and J. M. Zurada, Nonlinear blind source separation using a radial basis function network, IEEE Transactions on Neural Networks, vol.12, issue.1, pp.124-134, 2001.

Z. Zhang, J. Pinto, C. Plahl, B. Schuller, and D. Willett, Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices, IEEE Transactions on Consumer Electronics, vol.60, issue.3, pp.525-533, 2014.
DOI : 10.1109/TCE.2014.6937339

A. Narayanan and D. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7092-7096, 2013.
DOI : 10.1109/ICASSP.2013.6639038

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.368.881

J. C. Vasquez-correa, N. Garcia, J. R. Orozco-arroyave, J. D. Arias-londono, J. F. Vargas-bonilla et al., Emotion recognition from speech under environmental noise conditions using wavelet decomposition, 2015 International Carnahan Conference on Security Technology (ICCST), pp.247-252, 2015.
DOI : 10.1109/CCST.2015.7389690

F. Eyben, F. Weninger, and B. Schuller, Affect recognition in real-life acoustic conditions ? a new perspective on feature selection, Proc. of INTERSPEECH, 2013.

F. Weninger, B. Schuller, A. Batliner, S. Steidl, and D. Seppi, Recognition of Nonprototypical Emotions in Reverberated and Noisy Speech by Nonnegative Matrix Factorization, EURASIP Journal on Advances in Signal Processing, vol.16, issue.4, pp.1-16, 2011.
DOI : 10.1109/TASL.2006.876726

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol.61, pp.85-117, 2015.
DOI : 10.1016/j.neunet.2014.09.003

URL : http://arxiv.org/abs/1404.7828

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Proc. of NIPS, pp.3104-3112, 2014.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

A. Graves, Supervised sequence labelling with recurrent neural networks, 2012.
DOI : 10.1007/978-3-642-24797-2

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, Proc. of ICML, pp.843-852, 2015.

F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp.1-8, 2013.
DOI : 10.1109/FG.2013.6553805

F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi et al., AV + EC 2015 ? The first affect recognition challenge bridging across audio, video, and physiological data, Proc. of AVEC Workshop, pp.3-8, 2015.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
DOI : 10.1109/ASRU.2015.7404837

URL : https://hal.archives-ouvertes.fr/hal-01211376

M. Mauch and S. Ewert, The audio degradation toolbox and its application to robustness evaluation, Proc. of ISMIR, pp.83-88, 2013.

F. Weninger, J. Bergmann, and B. Schuller, Introducing CUR- RENNT: The munich open-source CUDA RecurREnt Neural Network Toolkit, Journal of Machine Learning Research, vol.16, pp.547-551, 2015.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, The Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008.

I. Lawrence and K. Lin, A concordance correlation coefficient to evaluate reproducibility, Biometrics, vol.45, issue.1, pp.255-268, 1989.

L. He, D. Jiang, L. Yang, E. Pei, P. Wu et al., Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks, Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, AVEC '15, pp.73-80, 2015.
DOI : 10.1145/2808196.2811641

N. Garner, P. Barrett, D. Howard, and A. Tyrrell, Robust noise detection for speech detection and enhancement, Electronics Letters, vol.33, issue.4, pp.270-271, 1997.
DOI : 10.1049/el:19970217

G. Trigeorgis, F. Ringeval, R. Bruckner, E. Marchi, M. Nicolaou et al., Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5200-5204, 2016.
DOI : 10.1109/ICASSP.2016.7472669

URL : http://research.gold.ac.uk/17322/1/learning_audio_paralinguistics_from_the_raw_waveform.pdf

L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, Highperformance robust speech recognition using stereo training data, Proc. of ICASSP, pp.301-304, 2001.