J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C. Lee et al., Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal Processing Magazine, vol.26, issue.3, pp.75-80, 2009.
DOI : 10.1109/MSP.2009.932166

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

I. Cohen, J. Benesty, and S. Gannot, Speech processing in modern communication: Challenges and perspectives, 2010.
DOI : 10.1007/978-3-642-11130-3

E. Vincent and Y. Deville, Audio applications, " in Handbook of Blind Source Separation, Independent Component Analysis and Applications, pp.779-819, 2010.

J. Li, L. Deng, R. Haeb-umbach, and Y. Gong, Robust Automatic Speech Recognition ? A Bridge to Practical Applications, 2015.

J. H. Hansen, P. Angkititrakul, J. Plucienkowski, S. Gallant, and U. , CU-Move " : Analysis & corpus development for interactive in-vehicle speech systems, Proc. Eurospeech, pp.2023-2026, 2001.

B. Lee, M. Hasegawa-johnson, C. Goudeseune, S. Kamdar, S. Borys et al., AVICAR: audio-visual speech corpus in a car environment, Proc. Interspeech, pp.2489-2492, 2004.

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.504-511, 2015.
DOI : 10.1109/ASRU.2015.7404837

URL : https://hal.archives-ouvertes.fr/hal-01211376

L. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillman, The translingual English database (TED), Proc. 3rd Int. Conf. on Spoken Language Processing (ICSLP), 1994.

A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart et al., The ICSI Meeting Corpus, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., pp.364-367, 2003.
DOI : 10.1109/ICASSP.2003.1198793

D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. Chu et al., The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms, Language Resources and Evaluation, vol.41, issue.3-4, pp.3-4, 2007.
DOI : 10.1007/s10579-007-9054-4

S. Renals, T. Hain, and H. Bourlard, Interpretation of Multiparty Meetings the AMI and Amida Projects, 2008 Hands-Free Speech Communication and Microphone Arrays, pp.115-118, 2008.
DOI : 10.1109/HSCMA.2008.4538700

A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, and J. Bilmes, The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments, Computer Speech & Language, vol.26, issue.1, pp.52-66, 2011.
DOI : 10.1016/j.csl.2010.12.003

M. Lincoln, I. Mccowan, J. Vepa, and H. K. Maganti, The multichannel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments, Proc. 2005 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.357-362, 2005.

C. Fox, Y. Liu, E. Zwyssig, and T. Hain, The Sheffield wargames corpus, Proc. Interspeech, pp.1116-1120, 2013.

G. Gravier, G. Adda, N. Paulsson, M. Carré, A. Giraudel et al., The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, Proc. 8th Int. Conf. on Language Resources and Evaluation (LREC), 2012, pp.114-118
URL : https://hal.archives-ouvertes.fr/hal-00712591

]. P. Bell, M. J. Gales, T. Hain, J. Kilgour, P. Lanchantin et al., The MGB challenge: Evaluating multi-genre broadcast media recognition, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.687-693, 2015.
DOI : 10.1109/ASRU.2015.7404863

J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, The PASCAL CHiME speech separation and recognition challenge, Computer Speech & Language, vol.27, issue.3, pp.621-633, 2013.
DOI : 10.1016/j.csl.2012.10.004

URL : https://hal.archives-ouvertes.fr/hal-00646370

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.162-167, 2013.
DOI : 10.1109/ASRU.2013.6707723

L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad et al., The DIRHA simulated corpus, Proc. 9th Int. Conf. on Language Resources and Evaluation (LREC), pp.2629-2634, 2014.

A. Brutti, M. Ravanelli, P. Svaizer, and M. Omologo, A speech event detection and localization task for multiroom environments, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp.157-161, 2014.
DOI : 10.1109/HSCMA.2014.6843271

A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, and M. Omologo, WOZ acoustic data collection for interactive TV, Proc. 6th Int. Conf. on Language Resources and Evaluation (LREC), pp.2330-2334, 2008.
DOI : 10.1007/s10579-010-9116-x

M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi et al., The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.275-282, 2015.
DOI : 10.1109/ASRU.2015.7404805

M. Vacher, B. Lecouteux, P. Chahuara, F. Portet, B. Meillon et al., The sweet-home speech and multimodal corpus for home automation interaction, Proc. of the 9th edition of the Language Resources and Evaluation Conference, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00953006

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le-roux et al., Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR, Proc. 12th Int. Conf. on Latent Variable Analysis and Signal Separation, pp.2015-91
DOI : 10.1007/978-3-319-22482-4_11

URL : https://hal.archives-ouvertes.fr/hal-01163493

D. Yu and L. Deng, Automatic Speech Recognition -A Deep Learning Approach, 2014.

J. Dibiase, H. Silverman, and M. Brandstein, Robust Localization in Reverberant Rooms, Microphone Arrays: Signal Processing Techniques and Applications, 2001.
DOI : 10.1007/978-3-662-04619-7_8

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, 1950.
DOI : 10.1016/j.sigpro.2011.09.032

URL : https://hal.archives-ouvertes.fr/inria-00576297

A. Ozerov and E. Vincent, Using the FASST source separation toolbox for noise robust speech recognition Available: https, Machine Listening in Multisource Environments, 2011.

Y. Salaün, E. Vincent, N. Bertin, N. Souvirà-a-labastie, X. Jaureguiberry et al., The Flexible Audio Source Separation Toolbox Version 2.0 Available: https, ICASSP, 2014.

S. Galliano, E. Geoffrois, G. Gravier, J. Bonastre, D. Mostefa et al., Corpus description of the ESTER evaluation campaign for the rich transcription of French broadcast news, Proc. 5th Int. Conf. on Language Resources and Evaluation (LREC), 2006.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, Proc. 2011 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.