, Phonétisation automatique et contenu phonétique d'un énoncé . 80 6.1.1 Le système de phonétisation automatique de Voxygen, vol.80
, , vol.87
92 6.4.1 Création d'une voix pour une langue déjà traitée, p.95 ,
,
, Bibliographie personnelle
Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01870866
Transcription phonétique automatique pour la synthèse de la parole, XXXIIe Journées d'Etudes sur la Parole (JEP 2018), 2018. ,
Error detection of graphemeto-phoneme conversion in text-to-speech synthesis using speech signal and lexical context, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Décembre 2017 ,
Détection des erreurs de phonétisation pour la synthèse de parole, 2017. ,
LIUM ASR systems for the 2016 Multi-Genre Broadcast Arabic Challenge, IEEE Workshop on Spoken Language Technology (SLT), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01433188
« Development of the MIT ASR system for the 2016 Arabic multi-genre broadcast challenge, IEEE Spoken Language Technology Workshop (SLT), pp.299-304, 2016. ,
« Speech recognition challenge in the wild : Arabic MGB-3, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.316-322, 2017. ,
« A complete kaldi recipe for building arabic speech recognition systems, IEEE Spoken Language Technology Workshop (SLT), pp.525-529, 2014. ,
« The MGB-2 challenge : Arabic multi-dialect broadcast media recognition, IEEE Spoken Language Technology Workshop (SLT), pp.279-284, 2016. ,
« Arabic phonetic dictionaries for speech recognition, Journal of Information Technology Research (JITR) 2.4, pp.67-80, 2009. ,
, Towards Turkish ASR : Anatomy of a rule-based Turkish g2p ». In : arXiv preprint, 2016.
« Deep speech 2 : End-to-end speech recognition in english and mandarin, International Conference on Machine Learning (ICML), pp.173-182, 2016. ,
, IARPA Babel Turkish Language Pack ». In : LDC2016S10 web download. Philadelphia : Linguistic Data Consortium, 2016.
« Deep voice : Real-time neural text-to-speech, Proceedings of the 34th International Conference on Machine Learning (ICML), pp.195-204, 2017. ,
« Turkish broadcast news transcription and retrieval, Transactions on Audio, Speech, and Language Processing, vol.17, pp.874-883, 2009. ,
« Neural machine translation by jointly learning to align and translate, 2014. ,
, , 2001.
, « LIA-PHON : Un système complet de phonétisation de textes, Traitement automatique des langues (TAL) 42.1, pp.47-67, 2001.
« Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part I-Temporal alignment, Journal of the Audio Engineering Society, vol.61, pp.366-384, 2013. ,
Visible Speech : The science of Universal alphabetics. London : Simpkin, 1867. ,
« The MGB challenge : Evaluating multi-genre broadcast media recognition, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.687-693, 2015. ,
« Word embeddings for speech recognition, 2014. ,
« A neural probabilistic language model, Journal of machine learning research, vol.3, issue.2, pp.1137-1155, 2003. ,
« Improving the Arabic pronunciation dictionary for phone and word recognition with linguisticallybased pronunciation rules, Proceedings of human language technologies : The 2009 annual conference of the North American chapter of the association for computational linguistics, pp.397-405, 2009. ,
« A multilingual text normalization approach, Language and Technology Conference, pp.515-526, 2011. ,
« Joint-sequence models for grapheme-to-phoneme conversion, Speech communication 50, vol.5, pp.434-451, 2008. ,
« The Blizzard Challenge-2005 : Evaluating corpus-based speech synthesis on common datasets, Ninth European Conference on Speech Communication and Technology, 2005. ,
« CHATR : a generic speech synthesis system, Proceedings of the 15th conference on Computational linguistics, vol.2, pp.983-986, 1994. ,
« Multilayer perceptrons and automatic speech recognition, Proceedings of the First International Conference on Neural Networks. T. 4, pp.407-416, 1987. ,
« Speech synthesis in various communicative situations : Impact of pronunciation variations, 2014. ,
« Arabic transliteration, 2002. ,
« Nmtpy : A flexible toolkit for advanced neural machine translation systems, The Prague Bulletin of Mathematical Linguistics 109, vol.1, pp.15-28, 2017. ,
La parole et son traitement automatique, 1989. ,
« A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs), International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp.219-224, 2013. ,
« Listen, attend and spell : A neural network for large vocabulary conversational speech recognition, International Conference on Acoustics, Speech and Signal Processing, pp.4960-4964, 2016. ,
« An empirical study of smoothing techniques for language modeling, Computer Speech & Language 13, vol.4, pp.359-394, 1999. ,
« How to compare tts systems : A new subjective evaluation methodology focused on differences, 2015. ,
« Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014. ,
« A Freely Available Morphological Analyzer for Turkish, LREC. T, vol.2, pp.19-28, 2010. ,
, « A set of open source tools for Turkish natural language processing, » In : LREC, pp.1079-1086, 2014.
« Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, pp.303-314, 1989. ,
« Testing the consistency assumption : Pronunciation variant forced alignment in read and spontaneous speech synthesis, International Conference on Acoustics, Speech and Signal Processing, pp.5155-5159, 2016. ,
« Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE transactions on acoustics, speech, and signal processing 28, vol.4, pp.357-366, 1980. ,
Traité de la formation mécanique des langues et des principes physiques de l'étymologie, 1765. ,
« Les Dix Intonations de base du français, The French Review, vol.40, pp.1-14, 1966. ,
« Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, pp.1-22, 1977. ,
« Interpréter la prosodie, XXIIe Journées d'Etudes sur la Parole (JEP), 2000. ,
« Speaker adaptation using constrained estimation of Gaussian mixtures, IEEE Transactions on speech and Audio Processing, vol.3, pp.357-366, 1995. ,
« Terminal analog synthesis of continuous speech using the diphone method of segment assembly, IEEE transactions on Audio and Electroacoustics 16, pp.40-50, 1968. ,
« Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
« Finding structure in time, Cognitive science 14, vol.2, pp.179-211, 1990. ,
, , 2018.
, « The CSTR entry to the 2018 Blizzard Challenge, Blizzard Challenge Workshop
Acoustic theory of speech production. 2, 1970. ,
« Voices of men and machines, The Journal of the Acoustical Society of America 51, vol.5, pp.1375-1387, 1972. ,
« Des fonctions de l'intonation : Essai de synthèse, Flambeau, vol.29, pp.1-20, 2003. ,
« Maximum likelihood linear transformations for HMM-based speech recognition, Computer speech & language 12, vol.2, pp.75-98, 1998. ,
« Bi-directional conversion between graphemes and phonemes using a joint n-gram model, 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, 2001. ,
, « Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion, Seventh International Conference on Spoken Language Processing, 2002.
« Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE transactions on speech and audio processing 2, pp.291-298, 1994. ,
« Etude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole, 2017. ,
« Deep voice 2 : Multi-speaker neural text-to-speech, Advances in neural information processing systems, pp.2962-2970, 2017. ,
, Deep Learning, 2016.
« Minimally supervised number normalization, Transactions of the Association for Computational Linguistics, vol.4, pp.507-519, 2016. ,
« Towards end-to-end speech recognition with recurrent neural networks », International Conference on Machine Learning (ICML), pp.1764-1772, 2014. ,
« Connectionist temporal classification : labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006. ,
, Euronews : a multilingual speech corpus for ASR. » In : LREC, pp.2635-2638, 2014.
« Computer processing of Turkish : Morphological and lexical investigation, 1995. ,
« DECtalk software : Text-to-speech technology and implementation, Digital Technical Journal, pp.5-19, 1995. ,
« Lexical normalisation of short text messages : Makn sens a# twitter, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies, vol.1, pp.368-378, 2011. ,
« A study of the building blocks in speech, The Journal of the Acoustical Society of America, vol.25, pp.962-969, 1953. ,
« Perceptual linear predictive (PLP) analysis of speech, Journal of the Acoustical Society of America, vol.87, pp.1738-1752, 1990. ,
, , 2018.
LIUM 3 : twice as much data and corpus repartition for experiments on speaker adaptation, International Conference on Speech and Computer, pp.198-208 ,
« Deep neural networks for acoustic modeling in speech recognition, IEEE Signal processing magazine, vol.29, 2012. ,
« Long short-term memory, Neural computation 9, vol.8, pp.1735-1780, 1997. ,
« Approximation Capabilities of Multilayer Feedforward Networks, Neural Networks, pp.251-257, 1991. ,
« Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, vol.160, pp.106-154, 1962. ,
« Unit selection in a concatenative speech synthesis system using a large speech database, International Conference on Acoustics, Speech, and Signal Processing Conference (ICASSP). T. 1. IEEE, pp.373-376, 1996. ,
« Continuous speech recognition by statistical methods, Proceedings of the IEEE 64, vol.4, pp.532-556, 1976. ,
, The ustc system for Blizzard Challenge, 2018.
« Attractor dynamics and parallelism in a connectionist sequential machine, Proc. of the Eighth Annual Conference of the Cognitive Science Society, 1986. ,
, , 2018.
, « Efficient neural audio synthesis
« Regular models of phonological rule systems, Computational linguistics 20, vol.3, pp.331-378, 1994. ,
« Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE transactions on acoustics, speech, and signal processing, vol.35, pp.400-401, 1987. ,
« QCRI advanced transcription system (QATS) for the Arabic multi-dialect broadcast media recognition : MGB-2 challenge, IEEE Spoken Language Technology Workshop (SLT), pp.292-298, 2016. ,
« The Blizzard Challenge, Blizzard Challenge Workshop, 2018. ,
« Adam : A method for stochastic optimization, 2014. ,
« The Klattalk text-to-speech conversion system, International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 7. IEEE, pp.1589-1592, 1982. ,
« Improved backing-off for m-gram language modeling, International Conference on Acoustics, Speech, and Signal Processing, 1995. ,
« Moses : Open source toolkit for statistical machine translation, Proceedings of the 45th annual meeting of the association for computational linguistics, pp.177-180, 2007. ,
« Grapheme to phoneme conversion using an SMT system, 2009. ,
« Convolutional networks and applications in vision, Proceedings of IEEE International Symposium on Circuits and Systems, pp.253-256, 2010. ,
« Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer speech & language 9, vol.2, pp.171-185, 1995. ,
Précis de phonostylistique : parole et expressivité, 1993. ,
« A broad-coverage normalization system for social media language, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics : Long Papers, vol.1, pp.1035-1044, 2012. ,
« A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition, 2015. ,
« Speech synthesis using HMMs with dynamic features, International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 1. IEEE, pp.389-392, 1996. ,
« Articulatory model for the study of speech production, The Journal of the Acoustical Society of America, vol.53, pp.1070-1082, 1973. ,
« EESEN : End-to-end speech recognition using deep RNN models and WFST-based decoding, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.167-174, 2015. ,
, , 2010.
, « Recurrent neural network based language model
« Efficient estimation of word representations in vector space, 2013. ,
Système de prononciation figurée applicable à toutes les langues, 1785. ,
« Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech communication, vol.9, pp.453-467, 1990. ,
« Minimally supervised written-tospoken text normalization, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.665-670, 2017. ,
« Lexical and phonetic modeling for Arabic automatic speech recognition, 2009. ,
« Failure transitions for joint ngram models and G2P conversion, Interspeech, pp.1821-1825, 2013. ,
« Improving WFST-based G2P conversion with alignment constraints and RNNLM N-best rescoring, 2012. ,
« Two-level description of Turkish morphology, Literary and Linguistic computing 9, vol.2, pp.137-148, 1994. ,
« A finite state pronunciation lexicon for Turkish, Proceedings of the EACL Workshop on Finite State Methods in NLP. T. 82, pp.900-918, 2003. ,
, « The architecture and the implementation of a finite state pronunciation lexicon for Turkish, Computer Speech & Language, vol.20, pp.80-106, 2006.
, Wavenet : A generative model for raw audio ». In : arXiv preprint, 2016.
« The history of automatic speech recognition evaluations at NIST, 2009. ,
« Madamira : A fast, comprehensive tool for morphological analysis and disambiguation of arabic, » In : LREC. T, vol.14, pp.1094-1101, 2014. ,
« A time delay neural network architecture for efficient modeling of long temporal contexts, 2015. ,
« A character-level machine translation approach for normalization of sms abbreviations, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.974-982, 2011. ,
« Glove : Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing, pp.1532-1543, 2014. ,
« Segmentation techniques in speech synthesis, The Journal of the Acoustical Society of America, vol.30, pp.739-742, 1958. ,
« Deep voice 3 : Scaling text-to-speech with convolutional sequence learning, 2017. ,
« fMPE : Discriminatively trained features for speech recognition, International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 1. IEEE, p.961, 2005. ,
« The Kaldi speech recognition toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE Signal Processing Society, 2011. ,
« Purely sequence-trained neural networks for ASR based on lattice-free MMI, » In : Interspeech, pp.2751-2755, 2016. ,
« A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77, vol.2, pp.257-286, 1989. ,
« Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks », International Conference on Acoustics, Speech and Signal Processing, pp.4225-4229, 2015. ,
« Perceptual evaluation of speech quality (PESQ) -a new method for speech quality assessment of telephone networks and codecs, International Conference on Acoustics, Speech, and Signal Processing, pp.749-752, 2001. ,
, TED-LIUM : an Automatic Speech Recognition dedicated corpus. » In : LREC, pp.125-129, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01434928
, « Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks, pp.3935-3939, 2014.
« Speech synthesis by rule using an optimal selection of nonuniform synthesis units, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.679-682, 1988. ,
« Turkish language resources : Morphological parser, morphological disambiguator and web corpus, International Conference on Natural Language Processing, pp.417-427, 2008. ,
« Turkish Broadcast News Speech and Transcripts, Web download. Philadelphia : Linguistic Data Consortium, 2012. ,
« Text normalization with varied data sources for conversational speech language modeling, International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 1. IEEE, p.789, 2002. ,
« L'histoire des alphabets phonétiques du XVIIIe siècle jusqu'à l'API, XXXIIe Journées d'Etudes sur la Parole (JEP), 2018. ,
« Continuous space language models, Computer Speech & Language, vol.21, pp.492-518, 2007. ,
« Natural tts synthesis by conditioning wavenet on mel spectrogram predictions, International Conference on Acoustics, Speech and Signal Processing, pp.4779-4783, 2018. ,
Byte Pair encoding : A text compression scheme that accelerates pattern matching. Rapp. tech, 1999. ,
« Advances in Arabic speech transcription at IBM under the DARPA GALE program, IEEE Transactions on Audio, Speech, and Language processing, vol.17, pp.884-894, 2009. ,
« Normalization of non-standard words, Computer speech & language 15, vol.3, pp.287-333, 2001. ,
« A scale for the measurement of the psychological magnitude pitch, The Journal of the Acoustical Society of America, vol.8, pp.185-190, 1937. ,
« SRILM -an extensible language modeling toolkit, Seventh International Conference on Spoken Language Processing, 2002. ,
« Advances in simulation of sentence-level speech production with kinematic models of the vocal tract and vocal folds, The Journal of the Acoustical Society of America, vol.126, pp.2205-2205, 2009. ,
, « TubeTalker : An airway modulation model of human sound production, Proceedings of the First International Workshop on Performative Speech and Singing Synthesis. P3S, pp.1-8, 2011.
« Speech parameter generation from HMM using dynamic features, International Conference on Acoustics, Speech, and Signal Processing (ICASSP). T. 1. IEEE, pp.660-663, 1995. ,
« Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing, 2014. ,
« On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models. » In : Interspeech, pp.3788-3792, 2016. ,
« LIUM ASR systems for the 2016 Multi-Genre Broadcast Arabic challenge, IEEE Spoken Language Technology Workshop (SLT), pp.285-291, 2016. ,
Techniques for noise robustness in automatic speech recognition, 2012. ,
« Phoneme recognition using time-delay neural networks, IEEE transactions on acoustics, speech, and signal processing, vol.37, pp.328-339, 1989. ,
« Tacotron : Towards end-to-end speech synthesis, 2017. ,
« The NDSC transcription system for the 2016 multi-genre broadcast challenge, IEEE Spoken Language Technology Workshop (SLT), pp.273-278, 2016. ,
« A log-linear model for unsupervised text normalization, Empirical Methods in Natural Language Processing Conference (EMNLP), pp.61-72, 2013. ,
« Sequence-to-sequence neural net models for graphemeto-phoneme conversion, 2015. ,
« Tree-based state tying for high accuracy acoustic modelling, Proceedings of the workshop on Human Language Technology, pp.307-312, 1994. ,
« ADADELTA : an adaptive learning rate method, 2012. ,
« The HMM-based speech synthesis system (HTS) version 2.0, pp.294-299, 2007. ,
, , 2017.
, Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework », Proc. Interspeech, pp.2541-2545, 2017.