P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications. Academic press, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00460653

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.107-115, 2014.
DOI : 10.1109/MSP.2013.2297440

URL : https://hal.archives-ouvertes.fr/hal-00922378

P. D. O-'grady, B. A. Pearlmutter, and S. T. Rickard, Survey of sparse and non-sparse methods in source separation, International Journal of Imaging Systems and Technology, vol.47, issue.33, pp.18-33, 2005.
DOI : 10.1002/ima.20035

O. Y?lmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions on Signal Processing, vol.52, issue.7, pp.1830-1847, 2004.
DOI : 10.1109/TSP.2004.828896

A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation, 2009.
DOI : 10.1002/9780470747278

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, Kernel Additive Models for Source Separation, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4298-4310, 2014.
DOI : 10.1109/TSP.2014.2332434

URL : https://hal.archives-ouvertes.fr/hal-01011044

A. Jourjine, S. Rickard, and O. Y?lmaz, Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.2985-2988, 2000.
DOI : 10.1109/ICASSP.2000.861162

D. P. Ellis and R. J. Weiss, Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.957-960, 2006.
DOI : 10.1109/ICASSP.2006.1661436

J. Durrieu, G. Richard, B. David, and C. Févotte, Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.564-575, 2010.
DOI : 10.1109/TASL.2010.2041114

G. Mysore, P. Smaragdis, and B. Raj, Non-negative Hidden Markov Modeling of Audio with Application to Source Separation, Proc. of Int. Conf. on Latent Variable Analysis and Signal Separation, pp.140-148, 2010.
DOI : 10.1007/978-3-642-15995-4_18

D. F. Rosenthal and H. G. Okuno, Computational auditory scene analysis, 1998.

B. Raj, R. Singh, and T. Virtanen, Phoneme-dependent NMF for speech enhancement in monaural mixtures, Proc. of Interspeech, pp.1217-1220, 2011.

N. Moritz, M. R. Schädler, K. Adil?-oglu, B. T. Meyer, T. Jürgens et al., Noise robust distant automatic speech recognition utilizing NMF based source separation and auditory feature extraction, Proc. of CHiME-2013, pp.1-6, 2013.

J. T. Geiger, F. Weninger, A. Hurmalainen, J. F. Gemmeke, M. Wöllmer et al., The TUM + TUT + KUL approach to the 2nd CHiME challenge: Multi-stream ASR exploiting BLSTM networks and sparse NMF, Proc. of CHiME-2013, pp.25-30, 2013.

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.162-167, 2013.
DOI : 10.1109/ASRU.2013.6707723

F. Weninger, J. R. Hershey, J. L. Roux, and B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp.577-581, 2014.
DOI : 10.1109/GlobalSIP.2014.7032183

E. M. Grais, M. U. Sen, and H. Erdogan, Deep neural networks for single channel source separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI : 10.1109/ICASSP.2014.6854299

N. Bertin, R. Badeau, and G. Richard, Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, pp.65-68, 2007.
DOI : 10.1109/ICASSP.2007.366617

URL : https://hal.archives-ouvertes.fr/hal-00945282

A. Rabinovich, S. Belongie, T. Lange, and J. M. Buhmann, Model Order Selection and Cue Combination for Image Segmentation, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), pp.1130-1137, 2006.
DOI : 10.1109/CVPR.2006.186

V. Y. Tan and C. Févotte, Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.7, pp.1592-1605, 2013.
DOI : 10.1109/TPAMI.2012.240

T. Virtanen, A. T. Cemgil, and S. Godsill, Bayesian extensions to nonnegative matrix factorisation for audio signal modelling, Proc. of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), pp.1825-1828, 2008.

X. Jaureguiberry, E. Vincent, and G. Richard, Multiple-order nonnegative matrix factorization for speech enhancement, Proc. of Interspeech, p.4, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01023399

I. Bloch, A. Hunter, A. Appriou, and A. Ayoun, Fusion: General concepts and characteristics, International Journal of Intelligent Systems, vol.9, issue.10, pp.1107-1134, 2001.
DOI : 10.1002/int.1052

J. Kittler, M. Hatef, R. P. Duin, and J. Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.3, pp.226-239, 1998.
DOI : 10.1109/34.667881

Y. Freund and R. E. Schapire, A desicion-theoretic generalization of online learning and an application to boosting, Computational learning theory, pp.23-37, 1995.

X. Jaureguiberry, G. Richard, P. Leveau, R. Hennequin, and E. Vincent, Introducing a simple fusion framework for audio source separation, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp.2013-2014
DOI : 10.1109/MLSP.2013.6661930

URL : https://hal.archives-ouvertes.fr/hal-00846834

S. Chandna and W. Wenwu, Improving model-based convolutive blind source separation techniques via bootstrap, 2014 IEEE Workshop on Statistical Signal Processing (SSP), pp.424-427, 2014.
DOI : 10.1109/SSP.2014.6884666

J. , L. Roux, S. Watanabe, and J. R. Hershey, Ensemble learning for speech enhancement, Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.1-4, 2013.

X. Jaureguiberry, E. Vincent, and G. Richard, Variational Bayesian model averaging for audio source separation, 2014 IEEE Workshop on Statistical Signal Processing (SSP), pp.33-36, 2014.
DOI : 10.1109/SSP.2014.6884568

URL : https://hal.archives-ouvertes.fr/hal-00986909

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005

URL : https://hal.archives-ouvertes.fr/inria-00544230

E. Vincent, R. Gribonval, and M. D. Plumbley, Oracle estimators for the benchmarking of source separation algorithms, Signal Processing, vol.87, issue.8, pp.1933-1950, 2007.
DOI : 10.1016/j.sigpro.2007.01.016

URL : https://hal.archives-ouvertes.fr/inria-00544194

D. P. Bertsekas, Nonlinear programming, Athena Scientific, 1999.

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky, Bayesian model averaging: a tutorial, Statistical science, pp.382-401, 1999.

C. M. Bishop, Pattern recognition and machine learning, 2006.

K. Katahira, K. Watanabe, and M. Okada, Deterministic annealing variant of variational Bayes method An interior trust region approach for nonlinear minimization subject to bounds, Journal of Physics: Conference Series, pp.418-445, 1996.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang et al., On rectified linear units for speech processing, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3517-3521, 2013.
DOI : 10.1109/ICASSP.2013.6638312

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.533-536, 1986.
DOI : 10.1038/323533a0

S. Duffner and C. Garcia, An Online Backpropagation Algorithm with Validation Error-Based Adaptive Learning Rate, Artificial Neural Networks, pp.249-258, 2007.
DOI : 10.1007/978-3-540-74690-4_26

D. Yu, L. Deng, F. T. Seide, and G. Li, Discriminative pretraining of deep neural networks, 2011.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU math expression compiler, Proc. of the Python for Scientific Computing Conference (SciPy), 2010.

F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow et al., Theano: new features and speed improvements, Proc. of Workshop on Deep Learning and Unsupervised Feature Learning (NIPS), 2012.

E. Vincent, J. Barker, S. Watanabe, J. Le-roux, F. Nesta et al., The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2013-126
DOI : 10.1109/ICASSP.2013.6637622

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

E. Vincent, Musical source separation using time-frequency source priors, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.91-98, 2006.
DOI : 10.1109/TSA.2005.860342

URL : https://hal.archives-ouvertes.fr/inria-00544269

J. J. Burred and T. Sikora, Comparison of frequency-warped representations for source separation of stereo mixtures, Proc. of Audio Engineering Society Convention, 2006.

M. Hoffman, D. M. Blei, and P. R. Cook, Bayesian nonparametric matrix factorization for recorded music, Proc. of International Conference on Machine Learning (ICML), pp.439-446, 2010.

K. Adil?-oglu and E. Vincent, Variational Bayesian inference for source separation and robust feature extraction, Inria, 2012.

M. Ravanelli, B. Elizalde, J. Bernd, and G. Friedland, Insights into Audio-Based Multimedia Event Classification with Neural Networks, Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions , MMCommons'15, pp.19-23, 2015.
DOI : 10.1145/2814815.2814816

J. Durrieu, B. David, and G. Richard, A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1180-1191, 2011.
DOI : 10.1109/JSTSP.2011.2158801

P. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-johnson, Singing-voice separation from monaural recordings using robust principal component analysis, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2012-57
DOI : 10.1109/ICASSP.2012.6287816

Z. Rafii and B. Pardo, Music/voice separation using the similarity matrix, Proc. of International Symposium on Music Information Retrieval (ISMIR), pp.583-588, 2012.

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, pp.556-562, 2001.

A. Corduneanu and C. M. Bishop, Variational Bayesian model selection for mixture distributions, Proc. of the 8th International Workshop on Artificial Intelligence and Statistics, pp.27-34, 2001.

J. M. Bernardo, The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures, Proc. of Valencia International Meeting on Bayesian Statistics, pp.453-462, 2002.