S. Tranter and D. Reynolds, An overview of automatic speaker diarization systems, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1557-1565, 2006.
DOI : 10.1109/TASL.2006.878256

N. Mirghafori and C. Wooters, Nuts and Flakes: a Study of Data Characteristics in Speaker Diarization, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1660196

X. Anguera, Robust Speaker Diarization for Meetings, 2006.

M. Kotti, E. Benetos, and C. Kotropoulos, Computationally Efficient and Robust BIC-Based Speaker Segmentation, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.5, 2008.
DOI : 10.1109/TASL.2008.925152
URL : http://spiral.imperial.ac.uk/bitstream/10044/1/11710/2/IEEE_TRANS_ASLP_2008_Margarita_Kotti.pdf

X. Zhu, C. Barras, L. Lamel, and J. Gauvain, Multi-stage Speaker Diarization for Conference and Lecture Meetings, " in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT Revised Selected Papers, pp.533-542, 2007.

S. Jothilakshmi, V. Ramalingam, and S. Palanivel, Speaker diarization using autoassociative neural networks, Engineering Applications of Artificial Intelligence, vol.22, issue.4-5, 2009.
DOI : 10.1016/j.engappai.2009.01.012

X. Anguera, C. Wooters, and J. Hernando, Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System, Proc. ICSLP, 2006.
DOI : 10.1007/11965152_31

C. Wooters and M. Huijbregts, The ICSI RT07s Speaker Diarization System, " in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR Revised Selected Papers, pp.509-519, 2007.

J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, Fast Incremental Clustering of Gaussian Mixture Speaker Models for Scaling up Retrieval In On-Line Broadcast, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1661327
URL : https://hal.archives-ouvertes.fr/hal-00448172

W. Tsai, S. Cheng, and H. Wang, Speaker clustering of speech utterances using a voice characteristic reference space, Proc. ICSLP, 2004.

T. H. Nguyen, E. S. Chng, and H. Li, T-test distance and clustering criterion for speaker diarization, Proc. Interspeech, 2008.

T. Nguyen, The IIR-NTU Speaker Diarization Systems for RT, RT'09, NIST Rich Transcription Workshop, 2009.

S. Meignier, J. Bonastre, and S. Igounet, E-HMM approach for learning and adapting sound models for speaker indexing, Proc. Odyssey Speaker and Language Recognition Workshop, pp.175-180, 2001.
URL : https://hal.archives-ouvertes.fr/hal-01434656

C. Fredouille and N. Evans, The LIA RT'07 speaker diarization system, " in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR Revised Selected Papers, pp.520-532, 2007.

C. Fredouille, S. Bozonnet, and N. W. Evans, The LIA-EURECOM RT'09 Speaker Diarization System, RT'09, NIST Rich Transcription Workshop, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00601383

S. Bozonnet, N. W. Evans, and C. Fredouille, The lia-eurecom RT'09 speaker diarization system: Enhancements in speaker modelling and cluster purification, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
DOI : 10.1109/ICASSP.2010.5495088
URL : https://hal.archives-ouvertes.fr/hal-00601383

D. Vijayasenan, F. Valente, and H. Bourlard, Agglomerative information bottleneck for speaker diarization of meetings data, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.250-255, 2007.
DOI : 10.1109/ASRU.2007.4430119

S. Mceachern, Estimating normal means with a conjugate style dirichlet process prior, Communications in Statistics: Simulation and Computation, pp.727-741, 1994.
DOI : 10.1017/CBO9780511526237

G. E. Hinton and D. Van-camp, Keeping the neural networks simple by minimizing the description length of the weights, Proceedings of the sixth annual conference on Computational learning theory , COLT '93, pp.5-13, 1993.
DOI : 10.1145/168304.168306

M. J. Wainwright and M. I. Jordan, Variational inference in graphical models: The view from the marginal polytope, Forty-first Annual Allerton Conference on Communication, Control, and Computing, 2003.

F. Valente, Variational Bayesian Methods for Audio Indexing, 2005.
DOI : 10.1007/11677482_27

D. Reynolds, P. Kenny, and F. Castaldo, A study of new approaches to speaker diarization, Proc. Interspeech. ISCA, 2009.

P. Kenny, Bayesian analysis of speaker diarization with eigenvoice priors, CRIM, 2008.

X. Anguera and J. Bonastre, A novel speaker binary key derived from anchor models, Proc. Interspeech, 2010.

Y. Huang, O. Vinyals, G. Friedland, C. Muller, N. Mirghafori et al., A fast-match approach for robust, faster than real-time speaker diarization, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.693-698, 2007.
DOI : 10.1109/ASRU.2007.4430196

G. Friedland, J. Ching, and A. Janin, Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing, 2010 IEEE International Symposium on Multimedia, 2010.
DOI : 10.1109/ISM.2010.26

X. Anguera, C. Wooters, and J. Hernando, Friends and enemies: A novel initialization for speaker diarization, Proc. ICSLP, 2006.

J. Ajmera, A robust speaker clustering algorithm, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721), pp.411-416, 2003.
DOI : 10.1109/ASRU.2003.1318476
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.6147

X. Anguera, C. Wooters, and J. Hernando, Purity Algorithms for Speaker Diarization of Meetings Data, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1660198

S. S. Chen and P. S. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. of DARPA Broadcast News Transcription and Understanding Workshop, pp.127-132, 1998.

H. Gish and M. Schmidt, Text-independent speaker identification, IEEE Signal Processing Magazine, pp.18-32, 1994.
DOI : 10.1109/79.317924

A. Janin, J. Ang, S. Bhagat, R. Dhillon, J. Edwards et al., The ICSI meeting project: Resources and research, Proc. ICASSP Meeting Recognition Workshop, 2004.

I. Mccowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban et al., The AMI meeting corpus, Proc. Measuring Behavior, 2005.

D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. M. Chu et al., The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms, Language Resources and Evaluation, vol.41, issue.3-4, 2007.
DOI : 10.1007/s10579-007-9054-4

C. Fredouille, D. Moraru, S. Meignier, L. Besacier, and J. Bonastre, The NIST 2004 spring Rich Transcription evaluation: Two-axis merging strategy in the context of multiple distant microphone based meeting speaker segmentation, NIST 2004 Spring Rich Transcrition Evaluation Workshop, 2004.

Q. Jin, K. Laskowski, T. Schultz, and A. Waibel, Speaker segmentation and clustering in meetings, Proc. ICSLP, 2004.

D. Istrate, C. Fredouille, S. Meignier, L. Besacier, and J. Bonastre, NIST RT05S evaluation: Pre-processing techniques and speaker diarization on multiple microphone meetings, NIST 2005 Spring Rich Transcrition Evaluation Workshop, 2005.
DOI : 10.1007/11677482_36
URL : https://hal.archives-ouvertes.fr/hal-01434285

X. Anguera, C. Wooters, B. Peskin, and M. Aguilo, Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System, Proc. NIST MLMI Meeting Recognition Workshop, 2005.
DOI : 10.1007/11677482_34

X. Anguera, C. Wooters, and J. Hernando, Acoustic Beamforming for Speaker Diarization of Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.7, pp.2011-2023, 2007.
DOI : 10.1109/TASL.2007.902460

X. Anguera, BeamformIt (the fast and robust acoustic beamformer)

N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, 1949.

A. Adami, L. Burget, S. Dupont, H. Garudadri, F. Grezl et al., Qualcomm-ICSI- OGI features for ASR, Proc. ICSLP, pp.4-7, 2002.

M. L. Seltzer, B. Raj, and R. M. Stern, Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol.12, issue.5, pp.489-498, 2004.
DOI : 10.1109/TSA.2004.832988
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.615.5654

L. J. Griffiths and C. W. Jim, An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, vol.30, issue.1, pp.27-34, 1982.
DOI : 10.1109/TAP.1982.1142739

M. Woelfel and J. Mcdonough, Distant Speech Recognition, 2009.

C. Wooters, J. Fung, B. Peskin, and X. Anguera, Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system, Rich Transcription Workshop, 2004.

J. Ramirez, J. M. Girriz, and J. C. Segura, Voice Activity Detection. Fundamentals and Speech Recognition System Robustness, Robust Speech Recognition and Understanding, p.460, 2007.
DOI : 10.5772/4740

C. Fredouille and G. Senay, Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records, Proc. MLMI Third International Workshop, pp.359-370, 2006.
DOI : 10.1007/11965152_32
URL : https://hal.archives-ouvertes.fr/hal-01317165

D. A. Leeuwen and M. Kone?n´kone?n´y, Progress in the AMIDA Speaker Diarization System for Meeting Data, " in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT, pp.475-483, 2007.

A. Rentzeperis, A. Stergious, C. Boukis, A. Pnevmatikakis, and L. Polymenakos, The 2006 Athens Information Technology Speech Activity Detection and Speaker Diarization Systems, Machine Learning for Multimodal Interaction: Third International Workshop, pp.385-395, 2006.
DOI : 10.1007/11965152_34

X. Anguera, C. Wooters, M. Anguilo, and C. Nadeu, Hybrid Speech/non-speech detector applied to Speaker Diarization of Meetings, 2006 IEEE Odyssey, The Speaker and Language Recognition Workshop, 2006.
DOI : 10.1109/ODYSSEY.2006.248109

H. Sun, T. L. Nwe, B. Ma, and H. Li, Speaker diarization for meeting room audio, Proc. Interspeech'09, 2009.

T. L. Nwe, H. Sun, H. Li, and S. Rahardja, Speaker diarization in meeting audio, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
DOI : 10.1109/ICASSP.2009.4960523

E. El-khoury, C. Senac, and J. Pinquier, Improved speaker diarization system for meetings, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
DOI : 10.1109/ICASSP.2009.4960529
URL : https://hal.archives-ouvertes.fr/hal-01433912

L. Lu, H. Zhang, and H. Jiang, Content analysis for audio classification and segmentation, IEEE Transactions on Speech and Audio Processing, vol.10, issue.7, pp.504-516, 2002.
DOI : 10.1109/TSA.2002.804546

R. Li, Q. Jin, and T. Schultz, Improving speaker segmentation via speaker identification and text segmentation, Proc. Interspeech, pp.3073-3076, 2009.

M. Ben, M. Betser, F. Bimbot, and G. Gravier, Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted gmms, Proc. ICSLP, 2004.

D. Van-leeuwen and M. Huijbregts, The AMI Speaker Diarization System for NIST RT06s Meeting Data, Machine Learning for Multimodal Interaction
DOI : 10.1007/11965152_33

A. Vandecatseye, J. Martens, J. Neto, H. Meinedo, C. Garcia-mateo et al., The cost278 paneuropean broadcast news database, Proc. LREC European Language Resources Association (ELRA), pp.873-876, 2004.

K. Mori and S. Nakagawa, Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.413-416, 2001.
DOI : 10.1109/ICASSP.2001.940855

J. Ajmera and I. Mccowan, Robust Speaker Change Detection, IEEE Signal Processing Letters, vol.11, issue.8, pp.649-651, 2004.
DOI : 10.1109/LSP.2004.831666
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.971

L. Lu and H. Zhang, Real-time unsupervised speaker change detection, Object recognition supported by user interaction for service robots, pp.358-361, 2002.
DOI : 10.1109/ICPR.2002.1048313

X. Anguera and J. Hernando, Evolutive speaker segmentation using a repository system, Proc. Interspeech, 2004.

X. Anguera, C. Wooters, and J. Hernando, Speaker diarization for multi-party meetings using acoustic fusion, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pp.426-431, 2005.
DOI : 10.1109/ASRU.2005.1566478
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.452.6901

A. Malegaonkar, A. Ariyaeeinia, P. Sivakumaran, and J. Fortuna, Unsupervised speaker change detection using probabilistic pattern matching, IEEE Signal Processing Letters, vol.13, issue.8, pp.509-512, 2006.
DOI : 10.1109/LSP.2006.873656
URL : http://uhra.herts.ac.uk/bitstream/2299/110/1/103570.pdf

M. Siu, G. Yu, and H. Gish, Segregation of speakers for speech recognition and speaker identification, Proc. ICASSP 91, 1991.

P. Delacourt and C. Wellekens, DISTBIC: A speaker-based segmentation for audio data indexing, Speech Communication, vol.32, issue.1-2, pp.111-126, 2000.
DOI : 10.1016/S0167-6393(00)00027-3

S. S. Han and K. J. Narayanan, Agglomerative hierarchical speaker clustering using incremental gaussian mixture cluster modeling, Proc. Interspeech'08, pp.20-23, 2008.

R. Gangadharaiah, B. Narayanaswamy, and N. Balakrishnan, A novel method for two speaker segmentation, Proc. ICSLP, 2004.

D. Liu and F. Kubala, Fast speaker change detection for broadcast news transcription and indexing, Proc. EuroSpeech-99, pp.1031-1034, 1999.

M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, Automatic segmentation , classification and clustering of broadcast news audio, Proc. DARPA Speech Recognition Workshop, pp.97-99, 1997.

P. Zochová and V. Radová, Modified DISTBIC algorithm for speaker change detection, Proc. 9th Eur. Conf, pp.3073-3076, 2005.

X. Zhu, C. Barras, L. Lamel, and J. Gauvain, Speaker Diarization: From Broadcast News to Lectures, Proc. MLMI, pp.396-406, 2006.
DOI : 10.1007/11965152_35

K. Han and S. Narayanan, Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4373-4376, 2008.
DOI : 10.1109/ICASSP.2008.4518624

D. Moraru, M. Ben, and G. Gravier, Experiments on speaker tracking and segmentation in radio broadcast news, Proc. ICSLP, 2005.

C. Barras, X. Zhu, S. Meignier, and J. Gauvain, Improving speaker diarisation, Proc. DARPA RT04, 2004.

H. Aronowitz, Trainable speaker diarization, Proc. Interspeech, pp.1861-1865, 2007.

H. Hung and G. Friedland, Towards Audio-Visual On-line Diarization Of Participants In Group Meetings, " in Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications -M2SFA2, 2008.

G. Friedland and O. Vinyals, Live speaker identification in conversations, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.1017-1018, 2008.
DOI : 10.1145/1459359.1459558

G. Friedland, O. Vinyals, Y. Huang, and C. Muller, Prosodic and other Long-Term Features for Speaker Diarization, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.5, pp.985-993, 2009.
DOI : 10.1109/TASL.2009.2015089

J. Luque, X. Anguera, A. Temko, and J. Hernando, Speaker diarization for conference room: The UPC RT07s evaluation system, " in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT Revised Selected Papers, pp.543-553, 2007.

J. Pardo, X. Anguera, and C. Wooters, Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features And Inter-Channel Time Differences, Proceedings of Interspeech, 2006.

G. Lathoud and I. M. Cowan, Location based speaker segmentation, Proc. ICASSP, pp.176-179, 2003.
DOI : 10.1109/icme.2003.1221388
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.8724

D. Ellis and J. C. Liu, Speaker turn detection based on betweenchannels differences, Proc. ICASSP, 2004.

J. Ajmera, G. Lathoud, and L. Mccowan, Clustering and segmenting speakers and their locations in meetings, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.605-613, 2004.
DOI : 10.1109/ICASSP.2004.1326058

J. M. Pardo, X. Anguera, and C. Wooters, Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features And Inter-Channel Time Differences, Proc. Interspeech, 2006.
DOI : 10.1007/11965152_23
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.9616

J. Pardo, X. Anguera, and C. Wooters, Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information, IEEE Transactions on Computers, vol.56, issue.9, pp.1212-1224, 2007.
DOI : 10.1109/TC.2007.1077

N. W. Evans, C. Fredouille, and J. Bonastre, Speaker diarization using unsupervised discriminant analysis of inter-channel delay features, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4061-4064, 2009.
DOI : 10.1109/ICASSP.2009.4960520
URL : https://hal.archives-ouvertes.fr/hal-01318388

M. Wölfel, Q. Yang, Q. Jin, and T. Schultz, Speaker Identification using Warped MVDR Cepstral Features, Proc. of Interspeech, 2009.

E. Shriberg, Higher-Level Features in Speaker Recognition, " in Speaker Classification I, ser, Lecture Notes in Artificial Intelligence, vol.4343, 2007.

D. Imseng and G. Friedland, Tuning-Robust Initialization Methods for Speaker Diarization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.8, 2010.
DOI : 10.1109/TASL.2010.2040796
URL : http://infoscience.epfl.ch/record/153578

E. Shriberg, A. Stolcke, and D. Baron, Observations on overlap: Findings and implications for automatic processing of multi-party conversations, Proc. Eurospeech, pp.1359-1362, 2001.

O. C. ¸-etin and E. Shriberg, Speaker overlaps and ASR errors in meetings: Effects before, during, and after the overlap, Proc. ICASSP, pp.357-360, 2006.

K. Boakye, B. Trueba-hornero, O. Vinyals, and G. Friedland, Overlapped speech detection for improved speaker diarization in multiparty meetings, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4353-4356, 2008.
DOI : 10.1109/ICASSP.2008.4518619

B. Trueba-hornero, Handling overlapped speech in speaker diarization, 2008.

K. Boakye, Audio Segmentation for Meetings Speech Processing, 2008.

S. Otterson and M. Ostendorf, Efficient use of overlap information in speaker diarization, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.686-692, 2007.
DOI : 10.1109/ASRU.2007.4430194

B. E. Kingsbury, N. Morgan, and S. Greenberg, Robust speech recognition using the modulation spectrogram, Speech Communication, vol.25, issue.1-3, pp.117-132, 1998.
DOI : 10.1016/S0167-6393(98)00032-6

H. J. Nock, G. Iyengar, and C. Neti, Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study, Lecture Notes in Computer Science, vol.2728, pp.565-570, 2003.
DOI : 10.1007/3-540-45113-7_48
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.8423

C. Zhang, P. Yin, Y. Rui, R. Cutler, and P. Viola, Boosting-Based Multimodal Speaker Detection for Distributed Meetings, 2006 IEEE Workshop on Multimedia Signal Processing, 2006.
DOI : 10.1109/MMSP.2006.285274
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.8526

A. Noulas and B. J. Krose, On-line multi-modal speaker diarization, Proceedings of the ninth international conference on Multimodal interfaces , ICMI '07, pp.350-357, 2007.
DOI : 10.1145/1322192.1322254

Z. Ghahramani and M. I. Jordan, Factorial hidden markov models, Machine Learning, vol.29, issue.2/3, pp.245-273, 1997.
DOI : 10.1023/A:1007425814087

A. K. Noulas, G. Englebienne, and B. J. Krose, Mutimodal speaker diarization, Computer Vision and Image Understanding, 2009.
DOI : 10.1109/tpami.2011.47

S. Tamura, K. Iwano, and S. Furui, Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images, Real World Speech Processing, 2004.

T. Chen and R. Rao, Cross-modal Prediction in Audio-visual Communication, Proc. ICASSP, pp.2056-2059, 1996.

J. W. Fisher, T. Darrell, W. T. Freeman, and P. A. Viola, Learning joint statistical models for audio-visual fusion and segregation, Proc. NIPS, pp.772-778, 2000.

J. W. Fisher and T. Darrell, Speaker Association With Signal-Level Audiovisual Fusion, IEEE Transactions on Multimedia, vol.6, issue.3, pp.406-413, 2004.
DOI : 10.1109/TMM.2004.827503
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.3704

R. Rao and T. Chen, Exploiting audio-visual correlation in coding of talking head sequences, International Picture Coding Symposium, 1996.

M. Siracusa and J. Fisher, Dynamic Dependency Tests for Audio-Visual Speaker Association, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.366271
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.6887

E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, CUAVE: A new audio-visual database for multimodal human-computer interface research, Proc. ICASSP, pp.2017-2020, 2002.
DOI : 10.1109/icassp.2002.5745028
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.6375

]. H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi, Audio Segmentation and Speaker Localization in Meeting Videos, 18th International Conference on Pattern Recognition (ICPR'06), pp.1150-1153, 2006.
DOI : 10.1109/ICPR.2006.283
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.653.970

H. Hung, Y. Huang, C. Yeo, and D. Gatica-perez, Associating audio-visual activity cues in a dominance estimation framework, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008.
DOI : 10.1109/CVPRW.2008.4563178

N. Campbell and N. Suzuki, Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus, Workshop Programme, 2006.

G. Friedland, H. Hung, and C. Yeo, Multimodal speaker diarization of real-world meetings using compressed-domain video features, Proc. ICASSP, pp.4069-4072, 2009.

G. Friedland, C. Yeo, and H. Hung, Visual speaker localization aided by acoustic models, Proceedings of the seventeen ACM international conference on Multimedia, MM '09, pp.195-202, 2009.
DOI : 10.1145/1631272.1631301

S. Meignier, D. Moraru, C. Fredouille, J. Bonastre, and L. Besacier, Step-by-step and integrated approaches in broadcast news speaker diarization, CSL, selected papers from the Speaker and Language Recognition Workshop, pp.303-330, 2006.
DOI : 10.1016/j.csl.2005.08.002
URL : https://hal.archives-ouvertes.fr/hal-01318554

D. Vijayasenan, F. Valente, and H. Bourlard, Combination of agglomerative and sequential clustering for speaker diarization, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4361-4364, 2008.
DOI : 10.1109/ICASSP.2008.4518621

E. El-khoury, C. Senac, and S. Meignier, Speaker diarization: combination of the LIUM and IRIT systems, 2008.

V. Gupta, P. Kenny, P. Ouellet, G. Boulianne, and P. Dumouchel, Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations, Signal Processing letters, pp.1040-1043, 2007.
DOI : 10.1109/LSP.2007.905088

T. S. Ferguson, A Bayesian Analysis of Some Nonparametric Problems, The Annals of Statistics, vol.1, issue.2, pp.209-230, 1973.
DOI : 10.1214/aos/1176342360

F. Valente, Infinite models for speaker clustering, International Conference on Spoken Language Processing, pp.6-19, 2006.

Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, Hierarchical Dirichlet Processes, Journal of the American Statistical Association, vol.101, issue.476, pp.1566-1581, 2006.
DOI : 10.1198/016214506000000302
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.8637

E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky, An HDP-HMM for systems with state persistence, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390196

M. Huijbregts and C. Wooters, The blame game: performance analysis of speaker diarization system components, Proc. Interspeech, pp.1857-60, 2007.