, References

, Microphone Arrays: Signal Processing Techniques and Applications, 2001.

M. Wölfel and J. Mcdonough, Distant Speech Recognition, 2009.

I. Cohen, J. Benesty, and S. Gannot, Speech Processing in Modern Communication: Challenges and Perspectives, 2010.

T. Virtanen, R. Singh, and B. Raj, Techniques for Noise Robustness in Automatic Speech Recognition, 2012.

J. Li, L. Deng, and R. , Haeb-Umbach, and Y. Gong, Robust Automatic Speech Recognition, 2015.

S. Gannot, E. Vincent, S. Markovich-golan, and A. Ozerov, A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.4, pp.692-730, 2017.
DOI : 10.1109/TASLP.2016.2647702

URL : https://hal.archives-ouvertes.fr/hal-01414179

E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01881431

C. Rascon and I. Meza, Localization of sound sources in robotics: A review, Robotics and Autonomous Systems, vol.96, pp.184-210, 2017.
DOI : 10.1016/j.robot.2017.07.011

¨. O. Y?lmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.
DOI : 10.1109/TSP.2004.828896

C. Faller and J. Merimaa, Source localization in complex listening situations: Selection of binaural cues based on interaural coherence, The Journal of the Acoustical Society of America, vol.116, issue.5, pp.3075-3089, 2004.
DOI : 10.1121/1.1791872

C. Knapp and G. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.
DOI : 10.1109/TASSP.1976.1162830

R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Transactions on Antennas and Propagation, vol.34, issue.3, pp.276-280, 1986.
DOI : 10.1109/TAP.1986.1143830

F. Nesta, P. Svaizer, and M. Omologo, Cumulative State Coherence Transform for a Robust Two-Channel Multiple Source Localization, Independent Component Analysis and Signal Separation, pp.290-297, 2009.
DOI : 10.1016/S1566-2535(03)00003-4

N. Roman, D. Wang, and G. J. Brown, Speech segregation based on sound localization, The Journal of the Acoustical Society of America, vol.114, issue.4, pp.2236-2252, 2003.
DOI : 10.1121/1.1610463

Z. Chami, A. Guérin, A. Pham, and C. Servì-ere, A phasebased dual microphone method to count and locate audio sources in reverberant rooms, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.209-212, 2009.

C. Liu, B. C. Wheeler, W. D. Obrien-jr, R. C. Bilger, C. R. Lansing et al., Localization of multiple sound sources with two microphones, The Journal of the Acoustical Society of America, vol.108, issue.4, pp.1888-1905, 2000.
DOI : 10.1121/1.1290516

S. Chakrabarty and E. A. Habets, Broadband doa estimation using convolutional neural networks trained with noise signals, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.136-140, 2017.
DOI : 10.1109/WASPAA.2017.8170010

URL : http://arxiv.org/pdf/1705.00919

, Multi-speaker localization using convolutional neural network trained with noise, NIPS 2017 Workshop on Machine Learning for Audio Processing, 2017.

R. Takeda and K. Komatani, Discriminative multiple sound source localization based on deep neural networks using independent location model, 2016 IEEE Spoken Language Technology Workshop (SLT), pp.603-609, 2016.
DOI : 10.1109/SLT.2016.7846325

N. Ma, G. J. Brown, and T. May, Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions, Interspeech, pp.3302-3306, 2015.

]. F. Vesperini, P. Vecchiotti, E. Principi, S. Squartini, and F. Piazza, A neural network based algorithm for speaker localization in a multi-room environment, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp.1-6, 2016.
DOI : 10.1109/MLSP.2016.7738817

V. Varanasi, R. Serizel, and E. Vincent, DNN based robust DOA estimation in reverberant, noisy and multi-source environment

S. Rickard and . Y?lmaz, On the approximate W-disjoint orthogonality of speech, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.529-532, 2002.

M. I. Mandel, D. P. Ellis, and T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments, 19th International Conference on Neural Information Processing Systems, pp.953-960, 2006.

H. Sawada, S. Araki, R. Mukai, and S. Makino, Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.5, pp.1592-1604, 2007.
DOI : 10.1109/TASL.2007.899218

C. Blandin, A. Ozerov, and E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering, Signal Processing, vol.92, issue.8, pp.1950-1960, 2012.
DOI : 10.1016/j.sigpro.2011.09.032

URL : https://hal.archives-ouvertes.fr/inria-00576297

Z. Chen, S. Watanabe, H. Erdogan, and J. R. Hershey, Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks, Interspeech, 2015.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 32nd International Conference on Machine Learning, pp.448-456, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2014.

E. A. Habets, RIR-Generator: Room impulse response generator Available: https://github

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: An ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5206-5210, 2015.
DOI : 10.1109/ICASSP.2015.7178964

M. Pariente and D. Pressnitzer, Predictive denoising of speech in noise using deep neural networks, The Journal of the Acoustical Society of America, vol.142, issue.4, pp.2611-2611, 2017.
DOI : 10.1121/1.5014560

F. Li, P. S. Nidadavolu, and H. Hermansky, A long, deep and wide artificial neural net for robust speech recognition in unknown noise, Interspeech, pp.358-362, 2014.

C. Valentini-botinhao, X. Wang, S. Takaki, and J. Yamagishi, Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks, Interspeech 2016, pp.352-356, 2016.
DOI : 10.21437/Interspeech.2016-159

URL : https://www.pure.ed.ac.uk/ws/files/26377240/Interspeech2016_Cassia_1.pdf

N. Bertin, E. Camberlein, E. Vincent, R. Lebarbenchon, S. Peillon et al., A French Corpus for Distant-Microphone Speech Processing in Real Homes, Interspeech 2016, pp.2781-2785, 2016.
DOI : 10.21437/Interspeech.2016-1384

URL : https://hal.archives-ouvertes.fr/hal-01343060

K. Kinoshita, M. Delcroix, S. Gannot, E. A. Habets, R. Haeb-umbach et al., A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research, EURASIP Journal on Advances in Signal Processing, vol.87, issue.7, pp.1-19, 2016.
DOI : 10.1109/TASL.2010.2052247

M. J. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol.12, issue.2, pp.75-98, 1998.
DOI : 10.1006/csla.1998.0043

URL : http://svr-www.eng.cam.ac.uk/~mjfg/lintran_CSL.ps.gz