An overview of automatic speaker diarization systems, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1557-1565, 2006. ,
DOI : 10.1109/TASL.2006.878256
Speaker Diarization: A Review of Recent Research, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.2, pp.356-370, 2012. ,
DOI : 10.1109/TASL.2011.2125954
URL : https://hal.archives-ouvertes.fr/hal-00733397
Achieving Human Parity in Conversational Speech Recognition Avail- able: https, Tech. Rep, 2017. ,
DOI : 10.1109/taslp.2017.2756440
Neural Networks, " in Supervised Sequence Labelling with Recurrent Neural Networks, pp.15-35, 2012. ,
DOI : 10.1007/978-3-642-24797-2
URL : http://mediatum.ub.tum.de/doc/1289309/document.pdf
LSTM Neural Networks for Language Modeling, Interspeech 2012, 13th Annual Conference of the International Speech Communication Association, pp.194-197, 2012. ,
DOI : 10.1109/taslp.2015.2400218
Sequence to sequence learning with neural networks, " in Advances in neural information processing systems, pp.3104-3112, 2014. ,
Artificial neural network features for speaker diarization, 2014 IEEE Spoken Language Technology Workshop (SLT), pp.402-406, 2014. ,
DOI : 10.1109/SLT.2014.7078608
URL : http://www.slaney.org/malcolm/Microsoft/Yella2014%28ANNSpeakerDiarization%29.pdf
Automatic segmentation , classification and clustering of broadcast news audio, Proc. DARPA speech recognition workshop, 1997. ,
Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp.127-132, 1998. ,
Factor analysis for speaker segmentation and improved speaker diarization, Interspeech 2015, 16th Annual Conference of the International Speech Communication Association, pp.3081-3085, 2015. ,
TristouNet: Triplet loss for speaker turn embedding, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. ,
DOI : 10.1109/ICASSP.2017.7953194
URL : http://arxiv.org/pdf/1609.04301
Minimum word error training of RNN-based voice activity detection, Interspeech 2015, 16th Annual Conference of the International Speech Communication Association, pp.2650-2654, 2015. ,
Hybrid speech recognition with Deep Bidirectional LSTM, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.273-278, 2013. ,
DOI : 10.1109/ASRU.2013.6707742
The Rich Transcription 2005 Spring Meeting Recognition Evaluation, International Workshop on Machine Learning for Multimodal Interaction (MLMI, pp.369-389, 2005. ,
DOI : 10.1007/11677482_32
URL : http://www.itl.nist.gov/iad/mig/publications/storage_paper/RT06SResults-v07.pdf
The ETAPE speech processing evaluation, LREC, pp.3995-3999, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01083636
Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech, Interspeech 2013, 14th Annual Conference of the International Speech Communication Association, pp.1131-1134, 2013. ,
Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997. ,
DOI : 10.1016/0893-6080(88)90007-X
The ETAPE corpus for the evaluation of speech-based TV content processing in the French language, LREC -Eighth international conference on Language Resources and Evaluation ,
URL : https://hal.archives-ouvertes.fr/hal-00712591
YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, ISMIR 2010, 11th International Society for Music Information Retrieval Conference, pp.441-446, 2010. ,
Keras Available: https://github, 2015. ,
Rmsprop loses to SMORMS3, 2015. ,
pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems, Interspeech 2017, 2017. ,
DOI : 10.21437/Interspeech.2017-411
Segmentation, classification and clustering of an Italian broadcast news corpus, Content-Based Multimedia Information Access, pp.372-381, 2000. ,
Partitioning and transcription of broadcast news data, ICSLP 1998, 5th International Conference on Spoken Language Processing, pp.1335-1338, 1998. ,
Multistage speaker diarization of broadcast news, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1505-1512, 2006. ,
DOI : 10.1109/TASL.2006.878261
URL : https://hal.archives-ouvertes.fr/hal-01434241