L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1993.

J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, The PASCAL CHiME speech separation and recognition challenge, Computer Speech & Language, vol.27, issue.3, 2012.
DOI : 10.1016/j.csl.2012.10.004
URL : https://hal.archives-ouvertes.fr/hal-00646370

E. Benetos and S. Dixon, Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model, The Journal of the Acoustical Society of America, vol.133, issue.3, p.1727, 2013.
DOI : 10.1121/1.4790351

A. Wang, An industrial strength audio search algorithm, Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR '03), pp.7-13, 2003.

J. Valin, F. Michaud, B. Hadjou, and J. Rouat, Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, pp.1033-1038, 2004.
DOI : 10.1109/ROBOT.2004.1307286

R. Ranft, Natural sound archives: past, present and future, Anais da Academia Brasileira de Ciências, pp.456-460, 2004.
DOI : 10.1590/S0001-37652004000200041

D. L. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, 2006.
DOI : 10.1109/9780470043387

D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange et al., Detection and classification of acoustic scenes and events: An IEEE AASP challenge, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701819
URL : https://hal.archives-ouvertes.fr/hal-01123765

J. Aucouturier, B. Defreville, and F. Pachet, The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music, The Journal of the Acoustical Society of America, vol.122, issue.2, pp.881-891, 2007.
DOI : 10.1121/1.2750160

I. H. Witten and E. Frank, Data mining, ACM SIGMOD Record, vol.31, issue.1, 2005.
DOI : 10.1145/507338.507355

N. Scaringella, G. Zoia, and D. Mlynek, Automatic genre classification of music content: a survey, IEEE Signal Processing Magazine, vol.23, issue.2, pp.133-141, 2006.
DOI : 10.1109/MSP.2006.1598089

J. J. Wolf, Efficient Acoustic Parameters for Speaker Recognition, The Journal of the Acoustical Society of America, vol.51, issue.6B, pp.2044-2056, 1972.
DOI : 10.1121/1.1913065

J. Foote, Content-based retrieval of music and audio, Proc SPIE, pp.138-147, 1997.

B. Cauchi, Non-negative matrix factorisation applied to auditory scenes classification, ATIAM (ParisTech), 2011.

E. Benetos, Automatic transcription of polyphonic music exploiting temporal evolution, 2012.

S. Chu, S. Narayanan, and C. Kuo, Environmental Sound Recognition With Time–Frequency Audio Features, IEEE Transactions in Audio, Speech and Language Processing, pp.1142-1158, 2009.
DOI : 10.1109/TASL.2009.2017438

S. E. Tranter and D. A. Reynolds, An overview of automatic speaker diarization systems, Audio, Speech, and Language Processing, pp.1557-1565, 2006.
DOI : 10.1109/TASL.2006.878256

A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, Acoustic event detection in real life recordings, European Signal Processing Conference, pp.1267-1271, 2010.

C. V. Cotton and D. P. Ellis, Spectral vs. spectro-temporal features for acoustic event detection, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.69-72, 2011.
DOI : 10.1109/ASPAA.2011.6082331

T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen, Sound event detection in multisource environments using source separation, Workshop on Machine Listening in Multisource Environments, pp.36-40, 2011.

T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, Context-dependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, vol.2013, issue.1, p.1, 2013.
DOI : 10.1109/89.365379

A. Mesaros, T. Heittola, and A. Klapuri, Latent semantic analysis in sound event detection, European Signal Processing Conference Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1307-1311, 2011.

M. J. Kim and H. Kim, Automatic extraction of pornographic contents using radon transform based audio features, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.205-210, 2011.
DOI : 10.1109/CBMI.2011.5972546

S. Ntalampiras, I. Potamitis, and N. Fakotakis, An Adaptive Framework for Acoustic Monitoring of Potential Hazards, EURASIP Journal on Audio, Speech, and Music Processing, vol.11, issue.6, 2009.
DOI : 10.1006/dspr.1999.0361

H. G. Okuno, T. Ogata, and K. Komatani, Computational Auditory Scene Analysis and Its Application to Robot Audition: Five Years Experience, Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07), pp.69-76, 2007.
DOI : 10.1109/ICKS.2007.7

D. Stowell and M. D. Plumbley, Segregating event streams and noise with a Markov renewal process model, Journal of Machine Learning Research, vol.14, pp.1891-1916, 2013.

E. Vincent, S. Araki, F. Theis, G. Nolte, P. Bofill et al., The signal separation evaluation campaign): Achievements and remaining challenges, Signal Processing, vol.92, issue.8, 1928.
URL : https://hal.archives-ouvertes.fr/inria-00579398

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.162-167, 2013.
DOI : 10.1109/ASRU.2013.6707723

J. S. Downie, A. F. Ehmann, M. Bay, and M. C. Jones, The Music Information Retrieval Evaluation eXchange: Some Observations and Insights, Advances in music information retrieval of Studies in Computational Intelligence, pp.93-115, 2010.
DOI : 10.1007/978-3-642-11674-2_5

R. Stiefelhagen, K. Bernardin, R. Bowers, J. Garofolo, D. Mostefa et al., The CLEAR 2006 Evaluation, Multimodal Technologies for Perception of Humans, pp.1-44, 2007.
DOI : 10.1007/978-3-540-69568-4_1

A. F. Smeaton, P. Over, and W. Kraaij, Evaluation campaigns and TRECVid, Proceedings of the 8th ACM international workshop on Multimedia information retrieval , MIR '06, pp.321-330, 2006.
DOI : 10.1145/1178677.1178722

B. L. Sturm, A Simple Method to Determine if a Music Information Retrieval System is a “Horse”, IEEE Transactions on Multimedia, vol.16, issue.6
DOI : 10.1109/TMM.2014.2330697

D. Stowell and M. D. Plumbley, An open dataset for research on audio field recording archives: freefield1010, Proceedings of the Audio Engineering Society 53rd Conference on Semantic Audio (AES53), 2014.

A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund et al., Audio-based context recognition, Audio, Speech, and Language Processing, pp.321-329, 2006.
DOI : 10.1109/TSA.2005.854103

K. Ellis, E. Coviello, and G. Lanckriet, Semantic annotation and retrieval of music using a bag of systems representation, Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR-11), pp.723-728, 2011.

T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, vol.27, issue.8, pp.861-874, 2006.
DOI : 10.1016/j.patrec.2005.10.010

J. Aucouturier and F. Pachet, Improving timbre similarity: how high's the sky?, Journal of Negative Results in Speech and Audio Sciences, vol.1, issue.1, pp.1-13, 2004.

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange et al., A database and challenge for acoustic scene classification and event detection, Proceedings of the European Signal Processing Conference (EUSIPCO), 2013.
URL : https://hal.archives-ouvertes.fr/hal-01123764

A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu et al., CLEAR Evaluation of Acoustic Event Detection and Classification Systems, Proc CLEAR, pp.311-322, 2007.
DOI : 10.1007/978-3-540-69568-4_29

R. Kompass, A Generalized Divergence Measure for Nonnegative Matrix Factorization, Neural Computation, vol.39, issue.3, pp.780-791, 2007.
DOI : 10.1162/089976602320264033

C. Schörkhuber and A. Klapuri, Constant-Q transform toolbox for music processing, Sound and Music Computing Conference, pp.3-64, 2010.

M. Chum, A. Habshush, A. Rahman, and C. Sang, IEEE aasp scene classification challenge using hidden Markov models and frame based classification, 2013.

B. Elizalde, H. Lei, F. G. , and N. Peters, An I-vector based approach for audio scene detection, 2013.

J. T. Geiger, B. Schuller, and G. , Recognising acoustic scenes with large-scale audio feature extraction and SVM, 2013.

J. D. Krijnders and G. A. Holt, A tone-fit feature representation for scene classification, 2013.

D. Li, J. Tam, and D. Toub, Auditory scene classification using machine learning techniques, 2013.

J. Nam, Z. Hyung, and K. Lee, Acoustic scene classification using sparse feature learning and selective max-pooling by event detection, 2013.

W. Nogueira, G. Roma, and P. Herrera, Sound scene identification based on MFCC, binaural features and a support vector machine classifier, 2013.

E. Olivetti, The wonders of the normalized compression dissimilarity representation, tech. rep, 2013.

K. Patil and M. Elhilali, Multiresolution auditory representations for scene classification, tech. rep, 2013.

A. Rakotomamonjy and G. Gasso, Histogram of gradients of time-frequency representations for audio scene classification, 2013.

G. Roma, W. Nogueira, and P. Herrera, Recurrence quantification analysis features for auditory scene classification, 2013.

S. Chauhan, S. Phadke, and C. Sherland, Event detection and classification, tech. rep, 2013.

A. Diment, T. Heittola, and T. Virtanen, Sound event detection for office live and office synthetic AASP challenge, 2013.

J. F. Gemmeke, L. Vuegen, B. Vanrumste, and H. Van-hamme, An exemplar-based NMF approach to audio event detection, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701847

W. Nogueira, G. Roma, and P. Herrera, Automatic event classification using front end single channel noise reduction, MFCC features and a support vector machine classifier, 2013.

M. E. Niessen, T. L. Van-kasteren, and A. Merentitis, Hierarchical sound event detection, 2013.

J. Schröder, B. Cauchi, M. R. Schädler, N. Moritz, K. Adiloglu et al., Acoustic event detection using signal enhancement and spectro-temporal feature extraction, 2013.

L. Vuegen, B. Van-den-broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste et al., An MFCC-GMM approach for event detection and classification, tech. rep, 2013.

J. D. Gibbons and S. Chakraborti, Nonparametric Statistical Inference, 2010.
DOI : 10.1007/978-3-642-04898-2_420

D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley, Acoustic scene classification, IEEE Signal Processing