T. L. Berg, A. C. Berg, J. Edwards, and D. A. Forsyth, Whos in the picture, NIPS, 2004.

D. Ozkan and P. Duygulu, A Graph Based Approach for Naming Faces in News Photos, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.1477-1482, 2006.
DOI : 10.1109/CVPR.2006.29

M. Guillaumin, T. Mensink, J. J. Verbeek, and C. Schmid, Automatic face naming with caption-based supervision, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587603

URL : https://hal.archives-ouvertes.fr/inria-00321048

P. T. Pham, M. Moens, and T. Tuytelaars, Cross-media alignment of names and faces, IEEE Transactions on Multimedia, vol.12, issue.1, 2010.

J. Luo, B. Caputo, and V. Ferrari, Who's doing what: Joint modeling of names and verbs for simultaneous face and pose annotation, NIPS, pp.1168-1176, 2009.

T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White et al., Names and faces in the news, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., pp.848-854, 2004.
DOI : 10.1109/CVPR.2004.1315253

M. E. Sargin, H. Aradhye, P. J. Moreno, and M. Zhao, Audiovisual celebrity recognition in unconstrained web videos, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1977-1980, 2009.
DOI : 10.1109/ICASSP.2009.4959999

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.308.4933

J. Kahn, O. Galibert, M. Carré, A. Giraudel, P. Joly et al., The repere challenge: Finding people in a multimodal context, Odyssey 2012 -The Speaker and Language Recognition Workshop, 2012.

V. Tran, V. B. Le, C. Barras, and L. Lamel, Comparing multistage approaches for cross-show speaker diarization, Proceedings of Interspeech, 2011.

Q. Yang, Q. Jin, and T. Schultz, Investigation of cross-show speaker diarization, Proceedings of Interspeech, 2011.

G. Dupuy, M. Rouvier, S. Meignier, and Y. Estève, I-vectors and ILP clustering adapted to cross-show speaker diarization, Proceedings of Interspeech, p.2012
URL : https://hal.archives-ouvertes.fr/hal-01450711

Y. E. Thierry-bazillon and D. Luzzati, Manual vs assisted transcription of prepared and spontaneous speech, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). Marrakech, Morocco: European Language Resources Association (ELRA), 2008.

M. Rouvier and S. Meignier, A global optimization framework for speaker diarization, Odyssey Workshop
URL : https://hal.archives-ouvertes.fr/hal-01433467

P. Kenny, G. Boulianne, and P. Dumouchel, Eigenvoice modeling with sparse training data, Speech and Audio Processing, pp.345-354, 2005.
DOI : 10.1109/TSA.2004.840940

D. Matrouf, N. Scheffer, B. Fauve, and J. Bonastre, A straightforward and efficient implementation of the factor analysis model for speaker verification, Proc. Interspeech, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01318480

O. Galibert and J. Kahn, The first official repere evaluation, SLAM 2013, 2013.

R. Barzilay, M. Collins, J. Hirschberg, and S. Wittaker, The rules behind roles: Identifying speaker role in radio broadcasts, Proceedings of the National Conference on Artificial Intelligence, pp.679-684, 2000.

T. Bazillon, B. Maza, M. Rouvier, F. Bechet, and A. Nasr, Speaker role recognition using question detection and characterization, Interspeech, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01196022