J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.

D. Yu, M. Kolbk, Z. Tan, and J. Jensen, Permutation invariant training of deep models for speaker-independent multi-talker speech separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.241-245, 2017.

Y. Isik, J. L. Roux, Z. Chen, S. Watanabe, and J. R. Hershey, Single-channel multispeaker separation using deep clustering, Proc. Interspeech, pp.545-549, 2016.

M. Kolbaek, D. Yu, Z. Tan, and J. Jensen, Multitalker speech separation with utterancelevel permutation invariant training of deep recurrent neural networks, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.25, issue.10, pp.1901-1913, 2017.

Z. Chen, Y. Luo, and N. Mesgarani, Deep attractor network for single-microphone speaker separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Z. Wang, J. L. Roux, and J. R. Hershey, Alternative objective functions for deep clustering, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.686-690, 2018.

Y. Luo and N. Mesgarani, Tasnet: Time-domain audio separation network for real-time, single-channel speech separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.

Z. Shi, H. Lin, L. Liu, R. Liu, J. Han et al., Deep attention gated dilated temporal convolutional networks with intra-parallel convolutional modules for end-to-end monaural speech separation, Proc. Interspeech, pp.3183-3187, 2019.

Y. Luo and N. Mesgarani, Conv-tasnet: Surpassing ideal time-frequency magnitude masking for speech separation, IEEE/ACM Trans. Audio, Speech and Lang. Proc, vol.27, issue.8, pp.1256-1266, 2019.

G. Yang, -. Chao, H. Tuan, L. Lee, and . Shan-lee, Improved speech separation with time-andfrequency cross-domain joint embedding and clustering, Proc. Interspeech, pp.1363-1367, 2019.

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. L. Roux et al., Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163493

T. Necciari, N. Holighaus, P. Balazs, Z. Pr??a, P. Majdak et al., Audlet filter banks: a versatile analysis/synthesis framework using auditory frequency scales, Applied Sciences, vol.8, issue.1, p.96, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01807393

G. Wichern, J. Antognini, M. Flynn, L. R. Zhu, E. Mcquinn et al., WHAM!: extending speech separation to noisy environments, Proc. Interspeech, pp.1368-1372, 2019.

M. Ravanelli and Y. Bengio, Speaker recognition from raw waveform with sincnet, 2018 IEEE Spoken Language Technology Workshop (SLT), pp.1021-1028, 2018.

E. Loweimi, P. Bell, and S. Renals, On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters, Proc. Interspeech, pp.3480-3484, 2019.

J. L. Flanagan, Parametric coding of speech spectra, The Journal of the Acoustical Society of America, vol.68, issue.2, pp.412-419, 1980.

, Scripts to generate the wsj0 hipster ambient mixtures dataset

J. L. Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, Sdr half-baked or well done?, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.626-630, 2019.

P. Diederik, J. Kingma, and . Ba, Adam: a method for stochastic optimization, 2014.

L. Liu, H. Jiang, P. He, W. Chen, X. Liu et al., On the variance of the adaptive learning rate and beyond, 2019.

M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, Lookahead optimizer: k steps forward, 1 step back, 2019.