Vision-guided robot hearing, The International Journal of Robotics Research, vol.26, issue.10, pp.437-456, 2015. ,
DOI : 10.1214/aos/1176344136
URL : https://hal.archives-ouvertes.fr/hal-00990766
Speaker Diarization: A Review of Recent Research, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.2, pp.356-370, 2012. ,
DOI : 10.1109/TASL.2011.2125954
Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1218-1225, 2014. ,
DOI : 10.1109/CVPR.2014.159
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.4, pp.718-731, 2015. ,
DOI : 10.1109/TASLP.2015.2405475
URL : https://hal.archives-ouvertes.fr/hal-01112834
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, pp.601-616, 2007. ,
DOI : 10.1109/TASL.2006.881678
EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.12, 2015. ,
DOI : 10.1109/TPAMI.2016.2522425
URL : https://hal.archives-ouvertes.fr/hal-01261374
Audiovisual speech-turn detection and tracking, The Twelfth International Conference on Latent Variable Analysis and Signal Separation, 2015. ,
DOI : 10.1007/978-3-319-22482-4_17
URL : https://hal.archives-ouvertes.fr/hal-01163659
Conjugate Mixture Models for Clustering Multimodal Data, Neural Computation, vol.49, issue.3, pp.517-557, 2011. ,
DOI : 10.1007/978-94-011-3436-1
URL : https://hal.archives-ouvertes.fr/inria-00590267
Cross-Modal Localization via Sparsity, IEEE Transactions on Signal Processing, vol.55, issue.4, pp.1390-1404, 2007. ,
DOI : 10.1109/TSP.2006.888095
Audio Assisted Robust Visual Tracking With Adaptive Particle Filtering, IEEE Transactions on Multimedia, vol.17, issue.2, pp.186-200, 2015. ,
DOI : 10.1109/TMM.2014.2377515
Multivariate t Distributions and their Applications, 2004. ,
Multimodal on-line speaker diarization using sensor fusion through SVM, IEEE Transactions on Multimedia, 2015. ,
A Multimodal Approach to Blind Source Separation of Moving Sources, IEEE Journal of Selected Topics in Signal Processing, vol.4, issue.5, pp.895-910, 2010. ,
DOI : 10.1109/JSTSP.2010.2057198
Multimodal Speaker Diarization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.1, pp.79-93, 2012. ,
DOI : 10.1109/TPAMI.2011.47
Recent advances in the automatic recognition of audiovisual speech, Proceedings of the IEEE, pp.1306-1326, 2003. ,
Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis, IEEE Transactions on Multimedia, vol.9, issue.7, pp.1396-1403, 2007. ,
DOI : 10.1109/TMM.2007.906583
A statistical model-based voice activity detection, IEEE Signal Processing Letters, vol.6, issue.1, pp.1-3, 1999. ,
DOI : 10.1109/97.736233
Robust mixture clustering using Pearson type VII distribution, Pattern Recognition Letters, vol.31, issue.16, pp.312447-2454, 2010. ,
DOI : 10.1016/j.patrec.2010.07.015
Beamforming: a versatile approach to spatial filtering, IEEE ASSP Magazine, vol.5, issue.2, pp.4-24, 1988. ,
DOI : 10.1109/53.665