H. Zen, K. Tokuda, and A. W. Black, Statistical parametric speech synthesis, Speech Communication, pp.1039-1064, 2009.
DOI : 10.1016/j.specom.2009.04.004

URL : https://hal.archives-ouvertes.fr/hal-00746106

A. Van-den-oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals et al., Wavenet: A generative model for raw audio, 2016.

Y. Wang, R. J. Skerry-ryan, D. Stanton, Y. Wu, R. J. Weiss et al., Tacotron: Towards End-to-End Speech Synthesis, Interspeech 2017, 2017.
DOI : 10.21437/Interspeech.2017-1452

S. Ark, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky et al., Deep voice: Real-time neural text-to-speech, 2017.

S. Ark, G. Diamos, A. Gibiansky, J. Miller, K. Peng et al., Deep voice 2: Multispeaker neural text-to-speech, 2017.

F. Béchet, Lia phon : un système complet de phonétisation de textes, Traitement Automatique des Langues (TAL), pp.47-67, 2001.

M. Bisani and H. Ney, Joint-sequence models for grapheme-to-phoneme conversion, Speech Communication, pp.434-451, 2008.
DOI : 10.1016/j.specom.2008.01.002

URL : https://hal.archives-ouvertes.fr/hal-00499203

L. Galescu and J. F. Allen, Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion, Proceedings of InterSpeech, 2002.

A. Laurent, P. Delglise, and S. Meignier, Grapheme to phoneme conversion using an smt system, Proceedings of InterSpeech, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01451534

K. Rao, F. Peng, H. Sak, and F. Beaufays, Graphemeto-phoneme conversion using long short-term memory recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2015-4225
DOI : 10.1109/icassp.2015.7178767

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.6326

K. Yao and G. Zweig, Sequence-to-sequence neural net models for grapheme-to-phoneme conversion, Proceedings of InterSpeech, 2015.

S. Brognaux, B. Picart, T. Drugman, and L. D. , Speech synthesis in various communicative situations: Impact of pronunciation variations, Proceedings of InterSpeech, 2014.

R. Dall, S. Brognaux, K. Richmond, C. Valentini-botinhao, G. E. Henter et al., Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2016-51555159
DOI : 10.1109/ICASSP.2016.7472660

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, International Conference on Learning Representations (ICLR), 2015.

O. Caglayan, M. García-martínez, A. Bardet, W. Aransa, F. Bougares et al., Nmtpy: A flexible toolkit for advanced neural machine translation systems, 2017.

D. Povey, A. Ghoshal, G. Boulianne, L. Burge, O. Glembek et al., The kaldi speech recognition toolkit, Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011.

J. R. Novak, P. R. Dixon, N. Minematsu, K. Hirose, C. Hori et al., Improving wfst-based g2p conversion with alignment constraints and rnnlm n-best rescoring, Proceedings of InterSpeech, 2012.

J. R. Novak, N. Minematu, and K. Hirose, Failure transitions for joint n-gram models and g2p conversion, Proceedings of InterSpeech, 2013.

A. Stolcke, Srilm ? an extensible language modeling toolkit, Proceedings of InterSpeech, 2002.

A. Stolcke, J. Zheng, and W. Wang, Srilm at sixteen: Update and outlook, Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, 2011.