C. Herv´ã?herv´-herv´ã?-bredin, C. Barras, and . Guinaudeau, Multimodal Person Discovery in Broadcast TV at MediaEval 2016, Working notes of the MediaEval 2016 Workshop, 2016.

H. Bredin, A. Roy, V. Le, and C. Barras, Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast, International Journal of Multimedia Information Retrieval, vol.17, issue.6, 2014.
DOI : 10.1109/79.888862

L. Canseco, L. Lamel, and J. L. Gauvain, A comparative study using manual and automatic transcriptions for diarization, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pp.415-419, 2005.
DOI : 10.1109/ASRU.2005.1566507
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.469.2523

D. Chen and J. Odobez, Video text recognition using sequential Monte Carlo and error voting methods, Pattern Recognition Letters, vol.26, issue.9, pp.1386-1403, 2005.
DOI : 10.1016/j.patrec.2004.11.019
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.157.4278

Y. Estève, S. Meignier, P. Deléglise, and J. Mauclair, Extracting true speaker identities from transcriptions, pp.2007-2601, 2007.

N. Le, Towards large scale multimedia indexing: A case study on person discovery in broadcast news, Proc. of International Workshop on Content-Based Multimedia Retrieval, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01551690

P. Gay, G. Dupuy, C. Lailler, J. M. Odobez, S. Meignier et al., Comparison of two methods for unsupervised person identification in TV shows, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2014.
DOI : 10.1109/CBMI.2014.6849828
URL : https://hal.archives-ouvertes.fr/hal-01433260

J. Kahn, O. Galibert, L. Quintard, M. Carrã?, A. Giraudel et al., A presentation of the REPERE challenge, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2012.
DOI : 10.1109/CBMI.2012.6269851

J. Richard, L. Gary, and G. Koch, The measurement of observer agreement for categorical data, biometrics, pp.159-174, 1977.

B. Perret, J. Cousty, J. C. , R. Ura, S. Jamil et al., Evaluation of Morphological Hierarchies for Supervised Segmentation, Proceedings of the 12th International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, pp.39-50, 2015.
DOI : 10.1007/978-3-319-18720-4_4
URL : https://hal.archives-ouvertes.fr/hal-01142072

J. Poignant, G. Fortier, L. Besacier, and G. Quénot, Naming multi-modal clusters to identify persons in TV broadcast. Multimedia Tools Appl, pp.15-8999, 2016.
DOI : 10.1007/s11042-015-2723-1
URL : https://hal.archives-ouvertes.fr/hal-01230628

M. Rouvier, G. Dupuy, P. Gay, E. Khoury, T. Merlin et al., An open-source state of the art toolbox for broadcast news diarization, Interspeech, pp.25-29, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01433449

G. Sargent, G. Barbosa-de-fonseca, I. Lyon-freire, R. Sicre, Z. Kleber-gonçalves-do-patrocínio-jr et al., PUCMinas and IRISA at Multimodal Person Discovery, Working Notes Proceedings of the MediaEval 2016 Workshop, 2016.

S. E. Tranter, Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1660195
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.7632

. Stijn-marinus-van-dongen, Graph clustering by flow simulation, 2001.

J. Yang, R. Yan, and A. G. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.31-40, 2005.
DOI : 10.1145/1101149.1101155

X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, In ICML, vol.3, pp.912-919, 2003.