Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.

I. Cohen and B. Berdugo, Speech enhancement for non-stationary noise environments, Signal processing, vol.81, issue.11, pp.2403-2418, 2001.

J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors, IEEE Transactions on Audio, Speech, and Language Processing, vol.15, issue.6, pp.1741-1752, 2007.

J. Sohn, N. S. Kim, and W. Sung, A statistical model-based voice activity detection, IEEE Signal Processing Letters, vol.6, issue.1, pp.1-3, 1999.

X. Li, R. Horaud, L. Girin, and S. Gannot, Voice activity detection based on statistical likelihood ratio with adaptive thresholding, IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp.1-5, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349776

H. Malik, Acoustic environment identification and its applications to audio forensics, IEEE Transactions on Information Forensics and Security, vol.8, issue.11, pp.1827-1837, 2013.

Y. Xu, J. Du, L. Dai, and C. Lee, Dynamic noise aware training for speech enhancement based on deep neural networks, Fifteenth Annual Conference of the International Speech Communication Association, 2014.

S. Fu, Y. Tsao, and X. Lu, SNR-Aware convolutional neural network modeling for speech enhancement, pp.3768-3772, 2016.

R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Transactions on Speech and Audio Processing, vol.9, issue.5, pp.504-512, 2001.

I. Cohen, Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Transactions on Speech and Audio Processing, vol.11, issue.5, pp.466-475, 2003.

S. Rangachari and P. C. Loizou, A noise-estimation algorithm for highly non-stationary environments, Speech communication, vol.48, issue.2, pp.220-231, 2006.

X. Li, L. Girin, S. Gannot, and R. Horaud, Non-stationary noise power spectral density estimation based on regional statistics, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.181-185, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01250892

R. C. Hendriks, R. Heusdens, and J. Jensen, MMSE based noise psd tracking with low complexity, IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp.4266-4269, 2010.

T. Gerkmann and R. C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.4, pp.1383-1393, 2012.

D. Wang and J. Chen, Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.26, issue.10, pp.1702-1726, 2018.

J. Heymann, L. Drude, and R. Haeb-umbach, Neural network based spectral mask estimation for acoustic beamforming, Acoustics, Speech and Signal Processing, pp.196-200, 2016.

T. Higuchi, N. Ito, T. Yoshioka, and T. Nakatani, Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise, Acoustics, Speech and Signal Processing, pp.5210-5214, 2016.

X. Xiao, S. Zhao, D. L. Jones, E. S. Chng, and H. Li, On timefrequency mask estimation for MVDR beamforming with application in robust speech recognition, Acoustics, Speech and Signal Processing, pp.3246-3250, 2017.

X. Zhang, Z. Wang, and D. Wang, A speech enhancement algorithm by iterating single-and multi-microphone processing and its application to robust ASR, Acoustics, Speech and Signal Processing, pp.276-280, 2017.

C. Boeddeker, H. Erdogan, T. Yoshioka, and R. Haeb-umbach, Exploring practical aspects of neural mask-based beamforming for farfield speech recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6697-6701, 2018.

F. Weninger, F. Eyben, and B. Schuller, Single-channel speech separation with memory-enhanced recurrent neural networks, Acoustics, Speech and Signal Processing, pp.3709-3713, 2014.

P. Papadopoulos, R. Travadi, and S. Narayanan, Global SNR estimation of speech signals for unknown noise conditions using noise adapted nonlinear regression, Proc. Interspeech, pp.3842-3846, 2017.

J. Chen and D. Wang, Long short-term memory for speaker generalization in supervised speech separation, The Journal of the Acoustical Society of America, vol.141, issue.6, pp.4705-4714, 2017.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

A. Varga and H. J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech communication, vol.12, issue.3, pp.247-251, 1993.

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett et al., Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database, vol.107, 1988.

F. Chollet, Keras, 2015.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol.78, issue.10, pp.1550-1560, 1990.

R. C. Hendriks, J. Jensen, and R. Heusdens, Noise tracking using DFT domain subspace decompositions, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.3, pp.541-553, 2008.

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.749-752, 2001.