A phonetization approach for the forced-alignment task
Résumé
The phonetization of text corpora requires a sequence of processing steps and resources in order to convert a normalized text in its constituent phones and then to directly exploit it by a given application. This paper presents a generic approach for text phonetization and concentrates on the aspects of phonetizing unknown words, which serve to develop a phonetizer in the context of forced-alignement application. It is a dictionary-based approach, which is as language-independent as possible: this approach is applied to French, English, Vietnamese, Khmer and Pinyin for Chinese. The tool with linked resources are distributed under the terms of the GPL license.