P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel et al., Trecvid 2010?an overview of the goals, tasks, data, evaluation mechanisms, and metrics, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00953843

A. Smeaton, P. Over, and W. Kraaij, High-Level Feature Detection from Video in TRECVid: A 5-Year Retrospective of Achievements, Multimedia Content Analysis, pp.151-174, 2009.
DOI : 10.1007/978-0-387-76569-3_6

G. Bernard, O. Galibert, and J. Kahn, The First Official REPERE Evaluation, SLAM-INTERSPEECH, 2013.

J. Kahn, O. Galibert, L. Quintard, M. Carré, A. Giraudel et al., A presentation of the REPERE challenge, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), 2012.
DOI : 10.1109/CBMI.2012.6269851

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The REPERE Corpus : a Multimodal Corpus for Person Recognition, LREC, 2012.

F. Vallet, S. Essid, and J. Carrive, A Multimodal Approach to Speaker Diarization on TV Talk-Shows, IEEE Transactions on Multimedia, vol.15, issue.3, pp.509-520, 2013.
DOI : 10.1109/TMM.2012.2233724

L. Canseco-rodriguez, L. Lamel, and J. Gauvain, Speaker diarization from speech transcripts, the 5th Annual Conference of the International Speech Communication Association, INTERSPEECH, p.p, 2004.

J. Mauclair, S. Meignier, and Y. Estève, Speaker diarization: about whom the speaker is talking? " in IEEE Odyssey 2006 -The Speaker and Language Recognition Workshop, p.p, 2006.

J. Poignant, L. Besacier, G. Quénot, and F. Thollard, From Text Detection in Videos to Person Identification, 2012 IEEE International Conference on Multimedia and Expo, 2012.
DOI : 10.1109/ICME.2012.119

URL : https://hal.archives-ouvertes.fr/hal-00767383

J. Poignant, L. Besacier, V. Le, S. Rosset, and G. Quénot, Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both, INTERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953088

J. Poignant, L. Besacier, and G. Quénot, Nommage non-supervisé des personnes dans les émissions de télévision: une revue du potentiel de chaque modalité, CORIA, 2013.

J. Poignant, L. Besacier, and G. Quénot, Nommage non supervis?? des personnes dans les ??missions de t??l??vision. Utilisation des noms ??crits, des noms prononc??s ou des deux ?, Documents numériques, 2014.
DOI : 10.3166/dn.17.1.37-60

V. Jousse, S. Petit-renaud, S. Meignier, Y. Estève, and C. Jacquin, Automatic named identification of speakers using diarization and ASR systems, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4557-4560, 2009.
DOI : 10.1109/ICASSP.2009.4960644

URL : https://hal.archives-ouvertes.fr/hal-00412431

H. Bredin, A. Roy, V. Le, and C. Barras, Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast, IJMIR, 2014.
DOI : 10.1109/79.888862

URL : https://hal.archives-ouvertes.fr/hal-01690350

L. Canseco, L. Lamel, and J. Gauvain, A comparative study using manual and automatic transcriptions for diarization, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 2005.
DOI : 10.1109/ASRU.2005.1566507

URL : https://www.lrde.epita.fr/~reda/cours/speech/speakerDiarization/1566507.pdf

S. E. Tranter, Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1660195

URL : http://mi.eng.cam.ac.uk/reports/svr-ftp/tranter_icassp06.pdf

Y. Estève, S. Meignier, P. Deléglise, and J. Mauclair, Extracting true speaker identities from transcriptions, INTERSPEECH, 2007.

S. Petit-renaud, V. Jousse, S. Meignier, and Y. Estève, Identification of speakers by name using belief functions, IPMU, 2010.

S. Satoh, Y. Nakamura, and T. Kanade, Name-It: naming and detecting faces in news videos, IEEE Multimedia, vol.6, issue.1, 1999.
DOI : 10.1109/93.752960

URL : http://www.ri.cmu.edu/pub_files/pub2/satoh_s_1999_1/satoh_s_1999_1.pdf

R. Houghton, Named Faces: putting names to faces, IEEE Intelligent Systems, vol.14, issue.5, 1999.
DOI : 10.1109/5254.796089

J. Yang and A. Hauptmann, Naming every individual in news video monologues, Proceedings of the 12th annual ACM international conference on Multimedia , MULTIMEDIA '04, 2004.
DOI : 10.1145/1027527.1027666

URL : http://www.cs.cmu.edu/~juny/Prof/papers/acmmm04a-jyang.pdf

J. Yang, R. Yan, and A. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, 2005.
DOI : 10.1145/1101149.1101155

T. Sato, T. Kanade, T. Hughes, M. Smith, and S. Satoh, Video OCR: indexing digital news libraries by recognition of superimposed captions, ACM Multimedia Systems, 1999.
DOI : 10.1007/s005300050140

P. Pham, M. Moens, and T. Tuytelaars, Naming persons in news video with label propagation, 2010 IEEE International Conference on Multimedia and Expo, 2010.
DOI : 10.1109/ICME.2010.5583271

P. Pham, T. Tuytelaars, and M. Moens, Naming People in News Videos with Label Propagation, IEEE Multimedia, vol.18, issue.3, 2011.
DOI : 10.1109/MMUL.2011.22

E. Khoury, C. Sénac, and P. Joly, Audiovisual diarization of people in video content, Multimedia Tools and Applications, vol.13, issue.4, p.2012
DOI : 10.1007/978-3-540-68585-2_49

J. Poignant, H. Bredin, V. Le, L. Besacier, C. Barras et al., Unsupervised speaker identification using overlaid texts in TV broadcast, INTERSPEECH, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00767427

H. Bredin, J. Poignant, M. Tapaswi, G. Fortier, V. Le et al., Fusion of Speech, Faces and Text for Person Identification in TV Broadcast, ECCV-IFCVCR, 2012.
DOI : 10.1007/978-3-642-33885-4_39

URL : https://hal.archives-ouvertes.fr/hal-00722884

J. Poignant, H. Bredin, L. Besacier, G. Quénot, and C. Barras, Towards a better integration of written names for unsupervised speakers identification in videos, SLAM-INTERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953089

J. Poignant, L. Besacier, and G. Quénot, Unsupervised Speaker Identification in TV Broadcast Based on Written Names, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.
DOI : 10.1109/TASLP.2014.2367822

URL : https://hal.archives-ouvertes.fr/hal-01060827

J. Poignant, G. Fortier, L. Besacier, and G. Quénot, Naming multimodal clusters to identify persons in TV broadcast, 2015.
DOI : 10.1007/s11042-015-2723-1

URL : https://hal.archives-ouvertes.fr/hal-01230628

J. Poignant, H. Bredin, and C. Barras, Limsi at mediaeval 2015: Person discovery in broadcast tv task, MediaEval, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01690333

H. Bredin and J. Poignant, Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast, IN- TERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953095

H. Bredin, A. Laurent, A. Sarkar, V. Le, S. Rosset et al., Person Instance Graphs for Named Speaker Identification in TV Broadcast, Odyssey, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01690272

B. Favre, G. Damnati, F. Béchet, M. Bendris, D. Charlet et al., PERCOLI: a person identification system for the 2013 REPERE challenge, SLAM-INTERSPEECH, 2013.

F. Bechet, M. Bendris, D. Charlet, G. Damnati, B. Favre et al., Multimodal Understanding for Person Recognition in Video Broadcasts, INTERSPEECH, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01194244

M. Rouvier, B. Favre, M. Bendris, D. Charlet, and G. Damnati, Scene understanding for identifying persons in TV shows: Beyond face authentication, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), 2014.
DOI : 10.1109/CBMI.2014.6849829

URL : https://hal.archives-ouvertes.fr/hal-01194242

M. Bendris, B. Favre, D. Charlet, G. Damnati, R. Auguste et al., Unsupervised face identification in TV content using audio-visual sources, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), 2013.
DOI : 10.1109/CBMI.2013.6576591

URL : https://hal.archives-ouvertes.fr/hal-00812334

P. Gay, G. Dupuy, C. Lailler, J. Odobez, S. Meignier et al., Comparison of two methods for unsupervised person identification in TV shows, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), 2014.
DOI : 10.1109/CBMI.2014.6849828

URL : https://hal.archives-ouvertes.fr/hal-01433260

S. Chen and P. Gopalakrishnan, Speaker, Environment And Channel Change Detection And Clustering Via The Bayesian Information Criterion, DARPA Broadcast News Trans. and Under, 1998.

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, Face Recognition from Caption-Based Supervision, International Journal of Computer Vision, vol.57, issue.2, p.2012
DOI : 10.1145/1027527.1027689

URL : https://hal.archives-ouvertes.fr/inria-00585834

M. U?i?á?, V. Franc, and V. Hlavá?, Facial Landmarks Detector Learned by the Structured Output SVM, VISAPP, 2012.
DOI : 10.1007/978-3-642-38241-3_26

L. Lamel, S. Courcinous, J. Despres, J. Gauvain, Y. Josse et al., Speech Recognition for Machine Translation in Quaero, IWSLT, 2011.

M. Dinarelli and S. Rosset, Models Cascade for Tree-Structured Named Entity Detection, IJCNLP, 2011.

N. Le, D. Wu, S. Meignier, and J. Odobez, Eumssi team at the mediaeval person discovery challenge, MediaEval, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01433209

C. E. , S. Jr, G. Gravier, and W. Schwartz, Ssig and irisa at multimodal person discovery, MediaEval, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01196171

M. Bendris, D. Charlet, G. Senay, M. Kim, B. Favre et al., Percolatte : A multimodal person discovery system in tv broadcast for the medieval 2015 evaluation campaign, MediaEval Lig at mediaeval 2015 multimodal person discovery in broadcast tv task MediaEval, 2015.

P. Lopez-otero, R. Barros, L. Docio-fernandez, E. González-agulla, J. Alba-castro et al., Gtm-uvigo systems for person discovery task at mediaeval 2015, MediaEval, 2015.

F. Nishi, N. Inoue, and K. Shinoda, Combining audio features and visual i-vector at mediaeval 2015 multimodal person discovery in broadcast tv, MediaEval, 2015.

M. India, D. Varas, V. Vilaplana, J. Morros, and J. Hernando, Upc system for the 2015 mediaeval multimodal person discovery in broadcast tv task, MediaEval, 2015.

M. Rouvier, G. Dupuy, P. Gay, E. Khoury, T. Merlin et al., An open-source state-of-the-art toolbox for broadcast news diarization, INTERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01433449

F. Bechet, M. Bendris, D. Charlet, G. Damnati, B. Favre et al., Multimodal understanding for person recognition in video broadcasts, INTER- SPEECH, pp.607-611, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01194244

G. Damnati and D. Charlet, Robust speaker turn role labeling of TV Broadcast News shows, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5684-5687, 2011.
DOI : 10.1109/ICASSP.2011.5947650

L. Breiman, Random forests, Machine Learning, vol.45, issue.1, pp.5-32, 2001.
DOI : 10.1023/A:1010933404324

J. Poignant, M. Budnik, H. Bredin, C. Barras, M. Stefas et al., The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents, LREC, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350096

J. Poignant, H. Bredin, C. Barras, M. Stefas, P. Bruneau et al., Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015, LREC, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01690277

E. Yilmaz and J. Aslam, Estimating average precision with incomplete and imperfect judgments, Proceedings of the 15th ACM international conference on Information and knowledge management , CIKM '06, 2006.
DOI : 10.1145/1183614.1183633

URL : http://goanna.cs.rmit.edu.au/~aht/tiger/p102-yilmaz.pdf