D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, Detection and Classification of Acoustic Scenes and Events, IEEE Transactions on Multimedia, vol.17, issue.10, pp.1733-1746, 2015.
DOI : 10.1109/TMM.2015.2428998
URL : https://hal.archives-ouvertes.fr/hal-01253912

A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund et al., Audio-based context recognition, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.321-329, 2006.
DOI : 10.1109/TSA.2005.854103

T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, Contextdependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, vol.2013, pp.1687-4722, 2013.

J. Dennis, H. Tran, and E. Chng, Overlapping sound event recognition using local spectrogram features and the generalised hough transform, Pattern Recognition Letters, vol.34, issue.9, pp.1085-1093, 2013.
DOI : 10.1016/j.patrec.2013.02.015

J. F. Gemmeke, L. Vuegen, P. Karsmakers, B. Vanrumste, and H. Van-hamme, An exemplar-based NMF approach to audio event detection, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701847
URL : https://lirias.kuleuven.be/bitstream/123456789/414172/1/3640_final.pdf

D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange et al., Detection and classification of acoustic scenes and events: An IEEE AASP challenge, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701819
URL : https://hal.archives-ouvertes.fr/hal-01123765

L. Vuegen, B. Van-den-broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste et al., An MFCC-GMM approach for event detection and classification, IEEE AASP DCASE Challenge, 2013.

A. Mesaros, T. Heittola, O. Dikmen, and T. Virtanen, Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.151-155, 2015.
DOI : 10.1109/ICASSP.2015.7177950

T. Komatsu, Y. Senda, and R. Kondo, Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2259-2263, 2016.
DOI : 10.1109/ICASSP.2016.7472079

E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, Polyphonic sound event detection using multi label deep neural networks, 2015 International Joint Conference on Neural Networks (IJCNN), 2015.
DOI : 10.1109/IJCNN.2015.7280624

]. G. Parascandolo, H. Huttunen, and T. Virtanen, Recurrent neural networks for polyphonic sound event detection in real life recordings, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6440-6444, 2016.
DOI : 10.1109/ICASSP.2016.7472917
URL : http://arxiv.org/abs/1604.00861

E. Benetos, G. Lafay, M. Lagrange, and M. D. Plumbley, Detection of overlapping acoustic events using a temporally-constrained probabilistic model, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6450-6454, 2016.
DOI : 10.1109/ICASSP.2016.7472919
URL : https://hal.archives-ouvertes.fr/hal-01255074

E. Benetos, M. Lagrange, and S. Dixon, Characterisation of acoustic scenes using a temporally-constrained shift-invariant model, 15th International Conference on Digital Audio Effects (DAFx), pp.317-323, 2012.

C. Cotton and D. Ellis, Spectral vs. spectro-temporal features for acoustic event detection, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.69-72, 2011.
DOI : 10.1109/ASPAA.2011.6082331

K. Murphy, Machine Learning: A Probabilistic Perspective, 2012.

P. Smaragdis, B. Raj, and M. Shashanka, A probabilistic latent variable model for acoustic modeling, Neural Information Processing Systems Workshop, 2006.

D. D. Li and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, pp.788-791, 1999.

M. Shashanka, B. Raj, and P. Smaragdis, Probabilistic Latent Variable Models as Nonnegative Factorizations, Computational Intelligence and Neuroscience, vol.59, issue.7, 2008.
DOI : 10.1038/381607a0
URL : http://doi.org/10.1155/2008/947438

P. Smaragdis and G. Mysore, Separation by “humming”: User-guided sound extraction from monophonic mixtures, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.69-72, 2009.
DOI : 10.1109/ASPAA.2009.5346542

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, issue.1, pp.1-38, 1977.

P. Smaragdis and B. Raj, Shift-invariant probabilistic latent component analysis Mitsubishi Electric Research Laboratories, Tech. Rep, pp.2007-2016, 2007.

C. Bishop, Pattern Recognition and Machine Learning, 2006.

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

C. Févotte, J. L. Roux, and J. R. Hershey, Non-negative dynamical system with application to speech and audio, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3158-3162, 2013.
DOI : 10.1109/ICASSP.2013.6638240

N. Mohammadiha, P. Smaragdis, G. Panahandeh, and S. Doclo, A State-Space Approach to Dynamic Nonnegative Matrix Factorization, IEEE Transactions on Signal Processing, vol.63, issue.4, pp.949-959, 2015.
DOI : 10.1109/TSP.2014.2385655

B. C. Moore, Frequency analysis and masking, " in Hearing ? Handbook of Perception and Cognition, pp.161-205, 1995.

E. Vincent, N. Bertin, and R. Badeau, Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.528-537, 2010.
DOI : 10.1109/TASL.2009.2034186
URL : https://hal.archives-ouvertes.fr/inria-00350163

G. Grindlay and D. Ellis, Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1159-1169, 2011.
DOI : 10.1109/JSTSP.2011.2162395
URL : http://academiccommons.columbia.edu/download/fedora_content/download/ac:149158/CONTENT/GrindE11-eigeninst.pdf

A. N. Langville, C. D. Meyer, and R. Albright, Initializations for the nonnegative matrix factorization, International Conference on Knowledge Discovery and Data Mining, 2006.

G. Lafay, M. Lagrange, M. Rossignol, E. Benetos, and A. , A Morphological Model for Simulating Acoustic Scenes and Its Application to Sound Event Detection, Speech and Language Processing, 2016.
DOI : 10.1109/TASLP.2016.2587218
URL : https://hal.archives-ouvertes.fr/hal-01111381

A. Mesaros, T. Heittola, and T. Virtanen, Metrics for Polyphonic Sound Event Detection, Applied Sciences, vol.6913, issue.6, pp.2076-3417162, 2016.
DOI : 10.1109/TASL.2009.2032947
URL : http://doi.org/10.3390/app6060162

G. Mysore, A non-negative framework for joint modeling of spectral structure and temporal dynamics in sound mixtures, 2010.