S. E. Tranter and D. A. Reynolds, An overview of automatic speaker diarization systems, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1557-1565, 2006.
DOI : 10.1109/TASL.2006.878256

X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland et al., Speaker Diarization: A Review of Recent Research, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.2, pp.356-370, 2012.
DOI : 10.1109/TASL.2011.2125954

URL : https://hal.archives-ouvertes.fr/hal-00733397

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer et al., Achieving Human Parity in Conversational Speech Recognition Avail- able: https, Tech. Rep, 2017.
DOI : 10.1109/taslp.2017.2756440

A. Graves, Neural Networks, " in Supervised Sequence Labelling with Recurrent Neural Networks, pp.15-35, 2012.
DOI : 10.1007/978-3-642-24797-2

URL : http://mediatum.ub.tum.de/doc/1289309/document.pdf

M. Sundermeyer, R. Schlüter, and H. Ney, LSTM Neural Networks for Language Modeling, Interspeech 2012, 13th Annual Conference of the International Speech Communication Association, pp.194-197, 2012.
DOI : 10.1109/taslp.2015.2400218

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, " in Advances in neural information processing systems, pp.3104-3112, 2014.

S. H. Yella, A. Stolcke, and M. Slaney, Artificial neural network features for speaker diarization, 2014 IEEE Spoken Language Technology Workshop (SLT), pp.402-406, 2014.
DOI : 10.1109/SLT.2014.7078608

URL : http://www.slaney.org/malcolm/Microsoft/Yella2014%28ANNSpeakerDiarization%29.pdf

M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, Automatic segmentation , classification and clustering of broadcast news audio, Proc. DARPA speech recognition workshop, 1997.

S. Chen and P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp.127-132, 1998.

B. Desplanques, K. Demuynck, and J. Martens, Factor analysis for speaker segmentation and improved speaker diarization, Interspeech 2015, 16th Annual Conference of the International Speech Communication Association, pp.3081-3085, 2015.

H. Bredin, TristouNet: Triplet loss for speaker turn embedding, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7953194

URL : http://arxiv.org/pdf/1609.04301

G. Gelly and J. Gauvain, Minimum word error training of RNN-based voice activity detection, Interspeech 2015, 16th Annual Conference of the International Speech Communication Association, pp.2650-2654, 2015.

A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with Deep Bidirectional LSTM, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.273-278, 2013.
DOI : 10.1109/ASRU.2013.6707742

J. G. Fiscus, N. Radde, J. S. Garofolo, A. Le, J. Ajot et al., The Rich Transcription 2005 Spring Meeting Recognition Evaluation, International Workshop on Machine Learning for Multimodal Interaction (MLMI, pp.369-389, 2005.
DOI : 10.1007/11677482_32

URL : http://www.itl.nist.gov/iad/mig/publications/storage_paper/RT06SResults-v07.pdf

O. Galibert, J. Leixa, G. Adda, K. Choukri, and G. Gravier, The ETAPE speech processing evaluation, LREC, pp.3995-3999, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01083636

O. Galibert, Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech, Interspeech 2013, 14th Annual Conference of the International Speech Communication Association, pp.1131-1134, 2013.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The ETAPE corpus for the evaluation of speech-based TV content processing in the French language, LREC -Eighth international conference on Language Resources and Evaluation
URL : https://hal.archives-ouvertes.fr/hal-00712591

B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Richard, YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, ISMIR 2010, 11th International Society for Music Information Retrieval Conference, pp.441-446, 2010.

F. Chollet, Keras Available: https://github, 2015.

S. Funk, Rmsprop loses to SMORMS3, 2015.

H. Bredin, pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems, Interspeech 2017, 2017.
DOI : 10.21437/Interspeech.2017-411

M. Cettolo, Segmentation, classification and clustering of an Italian broadcast news corpus, Content-Based Multimedia Information Access, pp.372-381, 2000.

J. Gauvain, L. Lamel, and G. Adda, Partitioning and transcription of broadcast news data, ICSLP 1998, 5th International Conference on Spoken Language Processing, pp.1335-1338, 1998.

C. Barras, X. Zhu, S. Meignier, and J. L. Gauvain, Multistage speaker diarization of broadcast news, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1505-1512, 2006.
DOI : 10.1109/TASL.2006.878261

URL : https://hal.archives-ouvertes.fr/hal-01434241