M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans et al., Introduction to voice presentation attack detection and recent advances, Book chapter N15 of, Handbook of Biometric Anti-Spoofing: Presentation Attack Detection

S. Springer-marcel, M. S. Nixon, and J. Fierrez,

. Springer, , 2018.

B. L. Pellom and J. H. Hansen, An experimental study of speaker verification sensitivity to computer voice-altered imposters, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), vol.2, pp.837-840, 1999.

. Iso/iec-30107, Information technology -biometric presentation attack detection, 2016.

N. Evans, T. Kinnunen, and J. Yamagishi, Spoofing and countermeasures for automatic speaker verification, Proc. Interspeech, Annual Conf. of the Int, pp.925-929, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01880306

Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi et al., ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, Proc. Interspeech, Annual Conf. of the Int, pp.2037-2041, 2015.

T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans et al., The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection, Proc. Interspeech, Annual Conf. of the Int. Speech Comm. Assoc, pp.2-6, 2017.

, ASVspoof 2019: the automatic speaker verification spoofing and countermeasures challenge evaluation plan

M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado et al., Asvspoof 2019: Future horizons in spoofed and fake audio detection
URL : https://hal.archives-ouvertes.fr/hal-02172099

J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly et al., 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4779-4783, 2018.

A. V. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals et al., Wavenet: A generative model for raw audio

C. Veaux, J. Yamagishi, and K. Macdonald, CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning

T. Kinnunen, K. Lee, H. Delgado, N. Evans, M. Todisco et al., t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification, Proc. Odyssey, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01880306

D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. ASSP, vol.32, issue.2, pp.236-243, 1984.

H. Zen, A. Senior, and M. Schuster, Statistical parametric speech synthesis using deep neural networks, Proc. ICASSP, pp.7962-7966, 2013.

. Hts-working-group, The English TTS system Flite+HTS engine, 2014.

X. Wang, S. Takaki, and J. Yamagishi, An autoregressive recurrent mixture density network for parametric speech synthesis, Proc. ICASSP, pp.4895-4899, 2017.

C. Doersch, Tutorial on variational autoencoders

A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, 2008.

X. Wang, J. Lorenzo-trueba, S. Takaki, L. Juvela, and J. Yamagishi, A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis, Proc. ICASSP, pp.4804-4808, 2018.

M. Morise, F. Yokomori, and K. Ozawa, WORLD: A vocoder-based highquality speech synthesis system for real-time applications, IEICE Trans. on Information and Systems, vol.99, issue.7, pp.1877-1884, 2016.

Z. Wu, O. Watts, and S. King, Merlin: An open source neural network speech synthesis system, Speech synthesis workshop SSW 2016, 2016.

. Hts-working-group, An example of context-dependent label format for HMM-based speech synthesis in Japanese, 2015.

M. Schröder, M. Charfuelan, S. Pammi, and I. Steiner, Open source voice creation toolkit for the MARY TTS platform, pp.3253-3256, 2011.

I. Steiner and S. L. Maguer, Creating new language and voice components for the updated MaryTTS text-to-speech synthesis platform, 11th Language Resources and Evaluation Conference (LREC), pp.3171-3175, 2018.

C. Hsu, H. Hwang, Y. Wu, Y. Tsao, and H. Wang, Voice conversion from non-parallel corpora using variational auto-encoder, p.2016

A. , Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp.1-6, 2016.

W. Huang, H. Hwang, Y. Peng, Y. Tsao, and H. Wang, Voice conversion based on cross-domain features using variational auto encoders, 2018 11th International Symposium on Chinese Spoken Language Processing, pp.51-55, 2018.

D. Matrouf and J. ,

C. Bonastre and . Fredouille, Effect of speech transformation on impostor acceptance, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol.1, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01318472

K. Tanaka, H. Kameoka, T. Kaneko, and N. Hojo, Wavecyclegan2: Timedomain neural post-filter for speech waveform generation

X. Wang, S. Takaki, and J. Yamagishi, Neural source-filter-based waveform model for statistical parametric speech synthesis, ICASSP 2019 -2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5916-5920, 2019.

H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, and P. , Szczepaniak, Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices, Interspeech, vol.2016, pp.2273-2277, 2016.

Y. Agiomyrgiannakis, Vocaine the vocoder and applications in speech synthesis, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4230-4234, 2015.

Y. Jia, Y. Zhang, R. Weiss, Q. Wang, J. Shen et al., Transfer learning from speaker verification to multispeaker text-to-speech synthesis, Advances in Neural Information Processing Systems, pp.4480-4490, 2018.

L. Wan, Q. Wang, A. Papir, and I. L. Moreno, Generalized end-to-end loss for speaker verification, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4879-4883, 2018.

N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande et al.,

D. W. Griffin and J. S. Lim, Signal estimation from modified short-time Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.2, pp.236-243, 1984.

Y. Li, K. Swersky, and R. Zemel, Generative moment matching networks, International Conference on Machine Learning, pp.1718-1727, 2015.

K. Kobayashi, T. Toda, and S. Nakamura, Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential, Speech Communication, vol.99, pp.211-220, 2018.

L. Liu, Z. Ling, Y. Jiang, M. Zhou, and L. Dai, WaveNet vocoder with limited training data for voice conversion, Annual Conference of the International Speech Communication Association, pp.1983-1987, 2018.

H. Kawahara, I. Masuda-katsuse, and A. D. Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency based F0 extraction: Possible role of a repetitive structure in sounds, Speech Communication, vol.27, issue.34, pp.187-207, 1999.
URL : https://hal.archives-ouvertes.fr/hal-01105608

W. Huang, Y. Wu, K. Kobayashi, Y. Peng, H. Hwang et al., Generalization of spectrum differential based direct waveform modification for voice conversion, Proc. SSW10, 2019.

T. Kinnunen, J. Lorenzo-trueba, J. Yamagishi, T. Toda, D. Saito et al., A spoofing benchmark for the 2018 voice conversion challenge: Leveraging from spoofing countermeasures for speech artifact assessment, Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, pp.187-194, 2018.

W. Verhelst and M. Roelands, An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modification of speech, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.554-557, 1993.

K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, Mel-generalized cepstral analysis-a unified approach to speech spectral estimation, Third International Conference on Spoken Language Processing, 1994.

T. Kinnunen, L. Juvela, P. Alku, and J. Yamagishi, Non-parallel voice conversion using i-vector plda: towards unifying speaker verification and transformation, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5535-5539, 2017.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Front-end factor analysis for speaker verification, vol.19, pp.788-798, 2011.

P. Kenny, A small footprint i-vector extractor, Proc. Odyssey 2012: the Speaker and Language Recognition Workshop, 2012.

S. J. Prince and J. H. Elder, Probabilistic linear discriminant analysis for inferences about identity, IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.

P. Kenny, Bayesian speaker verification with heavy-tailed priors, Odyssey 2010: The Speaker and Language Recognition Workshop, p.14, 2010.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, Xvectors: Robust DNN embeddings for speaker recognition

A. Hatch, S. Kajarekar, and A. Stolcke, Within-class covariance normalization for svm-based speaker recognition, vol.3, 2006.

L. Van-der-maaten and G. Hinton, Visualizing data using t-{SNE}, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

Z. Wu, J. Yamagishi, T. Kinnunen, C. Hanilçi, M. Sahidullah et al., ASVspoof: the automatic speaker verification spoofing and countermeasures challenge, IEEE Journal of Selected Topics in Signal Processing, vol.11, issue.4, pp.588-604, 2017.

A. Janicki, F. Alegre, and N. Evans, An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks, Security and Communication Networks, vol.9, pp.3030-3044, 2016.

F. Toole, Sound Reproduction: Loudspeakers and Rooms, Audio Engineering Society Presents Series, 2008.

J. B. Allen and D. A. Berkley, Image Method for Efficiently Simulating Small-Room Acoustics, J. Acoust. Soc. Am, vol.65, issue.4, pp.943-950, 1979.

E. Vincent and R. , , 2008.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, Xvectors: Robust DNN embeddings for speaker recognition, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp.5329-5333, 2018.

T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, A study on data augmentation of reverberant speech for robust speech recognition, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp.5220-5224, 2017.

S. J. Prince and J. H. Elder, Probabilistic linear discriminant analysis for inferences about identity, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, IEEE Signal Processing Society, 2011.

A. Nagrani, J. S. Chung, and A. Zisserman, Voxceleb: a large-scale speaker identification dataset

P. Bousquet and M. Rouvier, On robustness of unsupervised domain adaptation for speaker recognition, pp.2958-2962, 2019.

S. Ioffe, Probabilistic linear discriminant analysis, European Conference on Computer Vision, pp.531-542, 2006.

M. Todisco, H. Delgado, and N. Evans, A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients, Proc. Odyssey, 2016.

M. Todisco, H. Delgado, and N. Evans, Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification, Computer Speech & Language, vol.45, pp.516-535, 2017.

M. Sahidullah, T. Kinnunen, and C. Hanilçi, A comparison of features for synthetic speech detection, Proc. Interspeech, Annual Conf. of the Int, pp.2087-2091, 2015.

Z. Wu, P. L. De-leon, C. Demiroglu, A. Khodabakhsh, S. King et al., Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.4, pp.768-783, 2016.

A. Rosenberg and B. Ramabhadran, Bias and statistical significance in evaluating speech synthesis with Mean Opinion Scores, pp.3976-3980, 2017.