A. Allauzen and H. Bonneau-maynard, Training and evaluation of pos taggers on the french multitag corpus, LREC, pp.3373-3377, 2008.

F. Bechet, B. Favre, and G. Damnati, Detecting person presence in TV shows with linguistic and structural features, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5077-5080, 2012.
DOI : 10.1109/ICASSP.2012.6289062

URL : https://hal.archives-ouvertes.fr/hal-01194256

M. Bendris, B. Favre, D. Charlet, G. Damnati, R. Auguste et al., Unsupervised face identification in TV content using audio-visual sources, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.243-249, 2013.
DOI : 10.1109/CBMI.2013.6576591

URL : https://hal.archives-ouvertes.fr/hal-00812334

G. Bernard, S. Rosset, O. Galibert, E. Bilinski, and G. Adda, The LIMSI Participation in the QAst 2009 Track: Experimenting on Answer Scoring, CLEF, pp.289-296, 2009.
DOI : 10.1007/978-3-642-15754-7_33

H. Bredin, J. Poignant, M. Tapaswi, G. Fortier, V. B. Le et al., Fusion of Speech, Faces and Text for Person Identification in TV Broadcast, ECCV-IFCVCR, pp.385-394, 2012.
DOI : 10.1007/978-3-642-33885-4_39

URL : https://hal.archives-ouvertes.fr/hal-00722884

H. Bredin and J. Poignant, Integer linear programming for speaker diarization and crossmodal identification in tv broadcast, Interspeech, pp.1467-1471, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953095

H. Bredin, J. Poignant, G. Fortier, M. Tapaswi, V. B. Le et al., Qcompere at repere 2013, SLAM, pp.49-54, 2013.

L. Canseco-rodriguez, L. Lamel, and J. Gauvain, Speaker diarization from speech transcripts, INTERSPEECH, pp.1272-1275, 2004.

L. Canseco, L. Lamel, and J. Gauvain, A comparative study using manual and automatic transcriptions for diarization, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pp.415-419, 2005.
DOI : 10.1109/ASRU.2005.1566507

M. Charhad, D. Moraru, S. Ayache, and G. Quénot, Speaker identity indexing in audio-visual documents, Cbmi, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00953917

T. G. Dietterich, R. H. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.1-2, 1997.
DOI : 10.1016/S0004-3702(96)00034-3

E. El-khoury, A. Laurent, S. Meignier, and S. Petitrenaud, Combining transcription-based and acoustic-based speaker identifications for broadcast news, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4377-4380, 2012.
DOI : 10.1109/ICASSP.2012.6288889

Y. Estève, S. Meignier, P. Deléglise, and J. Mauclair, Extracting true speaker identities from transcriptions, INTERSPEECH, pp.2601-2604, 2007.

J. Gauvain, L. Lamel, and G. Adda, Partitioning and transcription of broadcast news data, ICSLP-AISSTC, pp.1335-1338, 1998.

J. Gauvain, L. Lamel, and G. Adda, The LIMSI Broadcast News transcription system, Speech communication, pp.89-108, 2002.
DOI : 10.1016/S0167-6393(01)00061-9

URL : https://hal.archives-ouvertes.fr/hal-01434493

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The repere corpus : a multimodal corpus for person recognition, LREC, pp.1102-1107, 2012.

R. Houghton, Named Faces: putting names to faces, IEEE Intelligent Systems, vol.14, issue.5, pp.45-50, 1999.
DOI : 10.1109/5254.796089

V. Jousse, S. Petitrenaud, S. Meignier, Y. Estève, and C. Jacquin, Automatic named identification of speakers using diarization and ASR systems, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4557-4560, 2009.
DOI : 10.1109/ICASSP.2009.4960644

URL : https://hal.archives-ouvertes.fr/hal-00412431

L. Lamel, S. Courcinous, J. Despres, J. Gauvain, Y. Josse et al., Speech recognition for machine translation in quaero, IWSLT, pp.121-128, 2011.

T. Lavergne, O. Cappé, and F. Yvon, Practical very large scale crfs, ACL, pp.504-513, 2010.

C. Liu, S. Jiang, and Q. Huang, Naming faces in broadcast news video by image google, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.717-720, 2008.
DOI : 10.1145/1459359.1459468

D. Marco and S. Rosset, Models cascade for tree-structured named entity detection, IJCNLP, pp.1269-1278, 2011.

J. Mauclair, S. Meignier, and Y. Estève, Speaker Diarization: About whom the Speaker is Talking ?, 2006 IEEE Odyssey, The Speaker and Language Recognition Workshop, 2006.
DOI : 10.1109/ODYSSEY.2006.248114

URL : https://hal.archives-ouvertes.fr/hal-01434121

S. Petitrenaud, V. Jousse, S. Meignier, and Y. Estève, Identification of Speakers by Name Using Belief Functions, IPMU, pp.179-188, 2010.
DOI : 10.1016/0004-3702(94)90026-4

S. Petitrenaud, V. Jousse, S. Meignier, and Y. Estève, Reconnaissance automatique de locuteurs á l'aide de fonctions de croyance, RFIA, pp.4557-4560, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433893

J. Poignant, L. Besacier, G. Quénot, and F. Thollard, From Text Detection in Videos to Person Identification, 2012 IEEE International Conference on Multimedia and Expo, pp.854-859, 2012.
DOI : 10.1109/ICME.2012.119

URL : https://hal.archives-ouvertes.fr/hal-00767383

J. Poignant, H. Bredin, V. B. Le, L. Besacier, C. Barras et al., Unsupervised speaker identification using overlaid texts in tv broadcast, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00767427

J. Poignant, L. Besacier, V. B. Le, S. Rosset, and G. Quénot, Unsupervised naming of speakers in broadcast tv: using written names, pronounced names or both ?, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953088

J. Poignant, L. Besacier, and G. Quénot, Nommage non-supervisé des personnes dans les émissions de télévision: une revue du potentiel de chaque modalité, CORIA, pp.5-20, 2013.

J. Poignant, H. Bredin, L. Besacier, G. Quénot, and C. Barras, Towards a better integration of written names for unsupervised speakers identification in videos, SLAM, pp.84-89, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953089

S. Satoh and T. Kanade, Name-It: association of face and name in video, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997.
DOI : 10.1109/CVPR.1997.609351

S. Satoh, Y. Nakamura, and T. Kanade, Name-it: naming and detecting faces in video by the integration of image and natural language processing, IJCAI, pp.1488-1493, 1997.

S. Satoh, Y. Nakamura, and T. Kanade, Name-It: naming and detecting faces in news videos, IEEE Multimedia, vol.6, issue.1, pp.22-35, 1999.
DOI : 10.1109/93.752960

R. E. Schapire and Y. Singer, Improved boosting algorithms using confidence-rated predictions, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.297-336, 1999.
DOI : 10.1145/279943.279960

X. Song, C. Lin, and M. Sun, Cross-modality automatic face model training from large video databases, CVPRW, p.91, 2004.

S. E. Tranter, Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.1013-1016, 2006.
DOI : 10.1109/ICASSP.2006.1660195

J. Yang and A. G. Hauptmann, Naming every individual in news video monologues, Proceedings of the 12th annual ACM international conference on Multimedia , MULTIMEDIA '04, pp.10-16, 2004.
DOI : 10.1145/1027527.1027666

J. Yang, R. Yan, and A. G. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.31-40, 2005.
DOI : 10.1145/1101149.1101155