J. Chen, J. Benesty, and Y. Huang, Time Delay Estimation in Room Acoustic Environments: An Overview, EURASIP Journal on Advances in Signal Processing, vol.11, issue.6, pp.170-170, 2006.
DOI : 10.1109/TSA.2003.818027
URL : https://doi.org/10.1155/asp/2006/26503

C. Knapp and G. C. Carter, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.4, pp.320-327, 1976.
DOI : 10.1109/TASSP.1976.1162830

J. H. Dibiase, H. F. Silverman, and M. S. Brandstein, Robust Localization in Reverberant Rooms, Microphone Arrays, pp.157-180, 2001.
DOI : 10.1007/978-3-662-04619-7_8

C. T. Ishi, O. Chatot, H. Ishiguro, and N. Hagita, Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2027-2032, 2009.
DOI : 10.1109/IROS.2009.5354309

O. Yilmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.
DOI : 10.1109/TSP.2004.828896
URL : http://www-sigproc.eng.cam.ac.uk/research/reading%20group/material/yilmaz%20rickard%20-%202004%20-%20blind%20separation%20of%20speech%20mixtures%20via%20time-frequencymasking.pdf

M. I. Mandel, R. J. Weiss, and D. P. Ellis, Model-Based Expectation-Maximization Source Separation and Localization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.2, pp.382-394, 2010.
DOI : 10.1109/TASL.2009.2029711
URL : http://www.ee.columbia.edu/%7Eronw/pubs/taslp09-messl.pdf

Y. Dorfan and S. Gannot, Tree-Based Recursive Expectation-Maximization Algorithm for Localization of Acoustic Sources, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.10, pp.1692-1703, 2015.
DOI : 10.1109/TASLP.2015.2444654

Y. Huang and J. Benesty, Adaptive Multichannel Time Delay Estimation Based on Blind System Identification for Acoustic Source Localization, Adaptive Signal Processing, pp.227-247, 2003.
DOI : 10.1007/978-3-662-11028-7_8

S. Doclo and M. Moonen, Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments, EURASIP Journal on Advances in Signal Processing, vol.2003, issue.11, pp.1110-1124, 2003.
DOI : 10.1155/S111086570330602X
URL : https://doi.org/10.1155/s111086570330602x

T. G. Dvorkind and S. Gannot, Time difference of arrival estimation of speech source in a noisy and reverberant environment, Signal Processing, vol.85, issue.1, pp.177-204, 2005.
DOI : 10.1016/j.sigpro.2004.09.014

K. Kowalczyk, E. A. Habets, W. Kellermann, and P. A. Naylor, Blind System Identification Using Sparse Learning for TDOA Estimation of Room Reflections, IEEE Signal Processing Letters, vol.20, issue.7, pp.653-656, 2013.
DOI : 10.1109/LSP.2013.2261059

X. Li, L. Girin, R. Horaud, and S. Gannot, Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.10, 1997.
DOI : 10.1109/TASLP.2017.2740001
URL : https://hal.archives-ouvertes.fr/hal-01413417

Y. Avargel and I. Cohen, System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.4, pp.1305-1319, 2007.
DOI : 10.1109/TASL.2006.889720

R. Talmon, I. Cohen, and S. Gannot, Relative Transfer Function Identification Using Convolutive Transfer Function Approximation, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.4, pp.546-555, 2009.
DOI : 10.1109/TASL.2008.2009576

D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2193-2206, 2013.
DOI : 10.1109/TASL.2013.2272524
URL : https://hal.archives-ouvertes.fr/hal-01367320

O. Schwartz and S. Gannot, Speaker Tracking Using Recursive EM Algorithms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.2, pp.392-402, 2014.
DOI : 10.1109/TASLP.2013.2292361

N. Roman and D. Wang, Binaural Tracking of Multiple Moving Sources, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.4, pp.728-739, 2008.
DOI : 10.1109/TASL.2008.918978

C. Evers, A. H. Moore, P. A. Naylor, J. Sheaffer, and B. Rafaely, Bearing-only acoustic tracking of moving speakers for robot audition, 2015 IEEE International Conference on Digital Signal Processing (DSP), pp.1206-1210, 2015.
DOI : 10.1109/ICDSP.2015.7252071
URL : http://www.commsp.ee.ic.ac.uk/%7Esap/uploads/publications/Evers2015.pdf

Y. Ban, L. Girin, X. Alameda-pineda, and R. Horaud, Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017.
DOI : 10.1109/ICCVW.2017.60
URL : https://hal.archives-ouvertes.fr/hal-01577965

Z. Liang, X. Ma, and X. Dai, Robust tracking of moving sound source using multiple model Kalman filter, Applied Acoustics, vol.69, issue.12, pp.1350-1355, 2008.
DOI : 10.1016/j.apacoust.2007.11.010

J. Vermaak and A. Blake, Nonlinear filtering for speaker tracking in noisy and reverberant environments, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.3021-3024, 2001.
DOI : 10.1109/ICASSP.2001.940294
URL : http://research.microsoft.com/users/jacov/papers/icassp2001.pdf

S. Ba, X. Alameda-pineda, A. Xompero, and R. Horaud, An on-line variational Bayesian model for multi-person tracking from cluttered scenes, Computer Vision and Image Understanding, vol.153, pp.64-76, 2016.
DOI : 10.1016/j.cviu.2016.07.006
URL : https://hal.archives-ouvertes.fr/hal-01349763

I. Gebru, S. Ba, X. Li, and R. Horaud, Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.5, 2017.
DOI : 10.1109/TPAMI.2017.2648793
URL : https://hal.archives-ouvertes.fr/hal-01413403

M. F. Fallon and S. J. , Acoustic Source Localization and Tracking of a Time-Varying Number of Speakers, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1409-1415, 2012.
DOI : 10.1109/TASL.2011.2178402

J. Valin, F. Michaud, and J. Rouat, Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering, Robotics and Autonomous Systems, vol.55, issue.3, pp.216-228, 2007.
DOI : 10.1016/j.robot.2006.08.004
URL : http://arxiv.org/pdf/1602.08139

V. Cevher, R. Velmurugan, and J. H. Mcclellan, Acoustic Multitarget Tracking Using Direction-of-Arrival Batches, IEEE Transactions on Signal Processing, vol.55, issue.6, pp.2810-2825, 2007.
DOI : 10.1109/TSP.2007.893962
URL : http://www.ece.rice.edu/~vc3/acoustic_tracking_using_DOA_batches.pdf

B. Vo, S. Singh, and W. K. Ma, Tracking multiple speakers using random sets, Acoustics, Speech, and Signal Processing Proceedings.(ICASSP'04). IEEE International Conference on, p.357, 2004.

W. Ma, B. Vo, S. S. Singh, and A. Baddeley, Tracking an unknown time-varying number of speakers using tdoa measurements: A random finite set approach, IEEE Transactions on Signal Processing, vol.54, issue.9, pp.3291-3304, 2006.

B. Vo and W. Ma, The Gaussian Mixture Probability Hypothesis Density Filter, IEEE Transactions on Signal Processing, vol.54, issue.11, pp.4091-4104, 2006.
DOI : 10.1109/TSP.2006.881190
URL : http://www.ee.unimelb.edu.au/people/bv/vo/VM_GMPHD_SP06.pdf

X. Li, B. Mourgue, L. Girin, S. Gannot, and R. Horaud, Online localization of multiple moving speakers in reverberant environments, The Tenth IEEE Workshop on Sensor Array and Multichannel Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01795462

J. Kivinen and M. K. Warmuth, Exponentiated Gradient versus Gradient Descent for Linear Predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.
DOI : 10.1006/inco.1996.2612
URL : https://doi.org/10.1006/inco.1996.2612

G. Xu, H. Liu, L. Tong, and T. Kailath, A least-squares approach to blind channel identification, IEEE Transactions on signal processing, vol.43, issue.12, pp.2982-2993, 1995.

]. X. Li, L. Girin, R. Horaud, and S. Gannot, Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.320-324, 2015.
DOI : 10.1109/ICASSP.2015.7177983
URL : https://hal.archives-ouvertes.fr/hal-01119186

A. L. Yuille and A. Rangarajan, The Concave-Convex Procedure, Neural Computation, vol.39, issue.4, pp.915-936, 2003.
DOI : 10.1162/08997660260028674

I. D. Gebru, X. Alameda-pineda, F. Forbes, and R. Horaud, EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.12, pp.2402-2415, 2016.
DOI : 10.1109/TPAMI.2016.2522425
URL : https://hal.archives-ouvertes.fr/hal-01261374

H. W. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss et al., The LOCATA challenge data corpus for acoustic source localization and tracking, IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2018.

X. Li, R. Horaud, L. Girin, and S. Gannot, Voice activity detection based on statistical likelihood ratio with adaptive thresholding, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp.1-5, 2016.
DOI : 10.1109/IWAENC.2016.7602911
URL : https://hal.archives-ouvertes.fr/hal-01349776

X. Li, L. Girin, F. Badeig, and R. Horaud, Reverberant sound localization with a robot head based on direct-path relative transfer function, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.2819-2826, 2016.
DOI : 10.1109/IROS.2016.7759437
URL : https://hal.archives-ouvertes.fr/hal-01349771