P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.

D. P. Kingma and M. Welling, Auto-encoding variational Bayes, Int. Conf. Learning Representations (ICLR), 2014.

M. Blaauw and J. Bonada, Modeling and transforming speech using variational autoencoders, Conf. of the Int. Speech Comm. Association (Interspeech), 2016.

C. C. Hsu, H. T. Hwang, Y. C. Wu, Y. Tsao, and H. M. Wang, Voice conversion from non-parallel corpora using variational auto-encoder, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1-6, 2016.

W. Hsu, Y. Zhang, and J. Glass, Learning latent representations for speech generation and transformation, Conf. of the Int. Speech Comm. Association (Interspeech), 2017.

K. Akuzawa, Y. Iwasawa, and Y. Matsuo, Expressive speech synthesis via modeling expressions with variational autoencoder, Conf. of the Int. Speech Comm. Association (Interspeech), 2018.

F. Roche, T. Hueber, S. Limier, and L. Girin, Autoencoders for music sound modeling: A comparison of linear, shallow, deep, recurrent and variational models, Sound and Music Computing Conference (SMC), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02349406

P. Esling, A. Chemla-romeu-santos, and A. Bitton, Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces, Int. Society for Music Information Retrieval Conf. (ISMIR), 2018.

Y. Bando, M. Mimura, K. Itoyama, K. Yoshii, and T. Kawahara, Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization, IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2018.

S. Leglaive, L. Girin, and R. Horaud, A variance modeling framework based on variational autoencoders for speech enhancement, IEEE Int. Workshop on Machine Learning for Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01832826

L. Pandey, A. Kumar, and V. Namboodiri, Monaural audio source separation using variational autoencoders, Conf. of the Int. Speech Comm. Association (Interspeech), 2018.

S. Leglaive, U. ?im?ekli, A. Liutkus, L. Girin, and R. Horaud, Speech enhancement with variational autoencoders and alpha-stable distributions, IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02005106

K. Sekiguchi, Y. Bando, K. Yoshii, and T. Kawahara, Bayesian multichannel speech enhancement with a deep speech prior, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1233-1239, 2018.

L. Li, H. Kameoka, and S. Makino, Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier, IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2019.

S. Leglaive, L. Girin, and R. Horaud, Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization, IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02005102

C. Févotte and A. T. Cemgil, Nonnegative matrix factorizations as probabilistic inference in composite models, European Signal Processing Conference (EUSIPCO), 2009.

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis, Neural computation, vol.21, issue.3, pp.793-830, 2009.

D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, pp.788-791, 1999.

A. Ozerov and C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.18, issue.3, pp.550-563, 2010.

N. Q. Duong, E. Vincent, and R. Gribonval, Underdetermined reverberant audio source separation using a fullrank spatial covariance model, IEEE Trans. Audio, Speech, Language Process, vol.18, issue.7, pp.1830-1840, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00435807

T. Gerber, M. Dutasta, L. Girin, and C. Févotte, Professionally-produced music separation guided by covers, Int. Society for Music Information Retrieval Conf. (ISMIR), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00807027

P. Smaragdis, C. Fevotte, G. Mysore, N. Mohammadiha, and M. Hoffman, Static and dynamic source separation using nonnegative factorizations: A unified view, IEEE Signal Processing Magazine, vol.31, issue.3, pp.66-75, 2014.

D. Kounades-bastian, L. Girin, X. Alameda-pineda, S. Gannot, and R. Horaud, A variational EM algorithm for the separation of moving sound sources, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01169764

S. Leglaive, R. Badeau, and G. Richard, Multichannel audio source separation with probabilistic reverberation priors, IEEE Transactions on Audio, Speech, and Language Processing, vol.24, issue.12, pp.2453-2465, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01370051

S. Leglaive, R. Badeau, and G. Richard, Student's t Source and Mixing Models for Multichannel Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.26, issue.6, pp.1150-1164, 2018.

Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.

A. Liutkus, R. Badeau, and G. Richard, Gaussian processes for underdetermined source separation, IEEE Transactions on Signal Processing, vol.59, issue.7, pp.3155-3167, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643951

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot et al., ?-vae: learning basic visual concepts with a constrained variational framework, Int. Conf. on Learning Representations (ICLR, 2017.

Y. Jung, Y. Kim, Y. Choi, and H. Kim, Joint learning using denoising variational autoencoders for voice activity detection, Conf. of the Int. Speech Comm. Association (Interspeech), 2018.

X. Li, L. Girin, and R. Horaud, An EM algorithm for audio source separation based on the convolutive transfer function, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01568818

P. Smaragdis and J. C. Brown, Non-negative matrix factorization for polyphonic music transcription, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003.

T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, Speech, and Language Processing, vol.15, pp.1066-1074, 2007.

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett et al., TIMIT acoustic phonetic continuous speech corpus, 1993.

J. Engel, C. Resnick, A. Roberts, S. Dieleman, D. Eck et al., Neural audio synthesis of musical notes with wavenet autoencoders, 2017.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ): A new method for speech quality assessment of telephone networks and codecs, IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp.749-752, 2001.

R. Huber and B. Kollmeier, PEMO-Q: A new method for objective audio quality assessment using a model of auditory perception, Speech, and Language Processing, vol.14, pp.1902-1911, 2006.

J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville et al., A recurrent latent variable model for sequential data, Advances in Neural Information Processing Systems, pp.2980-2988, 2015.

S. Sra and I. S. Dhillon, Generalized nonnegative matrix approximations with bregman divergences, Advances in Neural Information Processing Systems, pp.283-290, 2006.

A. Cichocki, S. Cruces, and S. Amari, Generalized alphabeta divergences and their application to robust nonnegative matrix factorization, Entropy, vol.13, issue.1, pp.134-170, 2011.

. Dafx-8,