Skip to Main content Skip to Navigation
Journal articles

Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions

Abstract : Accurate phonetic transcription of proper nouns can be an important resource for commercial applications that embed speech technologies, such as audio indexing and vocal phone directory lookup. However, an accurate phonetic transcription is more difficult to obtain for proper nouns than for regular words. Indeed, phonetic transcription of a proper noun depends on both the origin of the speaker pronouncing it and the origin of the proper noun itself. This work proposes a method that allows the extraction of phonetic transcriptions of proper nouns using actual utterances of those proper nouns, thus yielding transcriptions based on practical use instead of mere pronunciation rules. The proposed method consists in a process that first extracts phonetic transcriptions, and then iteratively filters them. In order to initialize the process, an alignment dictionary is used to detect word boundaries. A rule-based grapheme-to-phoneme generator (LIA_PHON), a knowledge-based approach (JSM), and a Statistical Machine Translation based system were evaluated for this alignment. As a result, compared to our reference dictionary (BDLEX supplemented by LIA_PHON for missing words) on the ESTER 1 French broadcast news corpus, we were able to significantly decrease the Word Error Rate (WER) on segments of speech with proper nouns, without negatively affecting the WER on the rest of the corpus.
Document type :
Journal articles
Complete list of metadata
Contributor : sylvain meignier Connect in order to contact the contributor
Submitted on : Wednesday, March 22, 2017 - 5:18:49 PM
Last modification on : Tuesday, March 28, 2017 - 1:05:28 AM
Long-term archiving on: : Friday, June 23, 2017 - 12:27:55 PM


Files produced by the author(s)




Antoine Laurent, Sylvain Meignier, Paul Deléglise. Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions. Computer Speech and Language, Elsevier, 2014, 28 (4), pp.979-996. ⟨10.1016/j.csl.2014.02.006⟩. ⟨hal-01433238⟩



Record views


Files downloads