Skip to Main content Skip to Navigation
Journal articles

Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions

Abstract : Accurate phonetic transcription of proper nouns can be an important resource for commercial applications that embed speech technologies, such as audio indexing and vocal phone directory lookup. However, an accurate phonetic transcription is more difficult to obtain for proper nouns than for regular words. Indeed, phonetic transcription of a proper noun depends on both the origin of the speaker pronouncing it and the origin of the proper noun itself. This work proposes a method that allows the extraction of phonetic transcriptions of proper nouns using actual utterances of those proper nouns, thus yielding transcriptions based on practical use instead of mere pronunciation rules. The proposed method consists in a process that first extracts phonetic transcriptions, and then iteratively filters them. In order to initialize the process, an alignment dictionary is used to detect word boundaries. A rule-based grapheme-to-phoneme generator (LIA_PHON), a knowledge-based approach (JSM), and a Statistical Machine Translation based system were evaluated for this alignment. As a result, compared to our reference dictionary (BDLEX supplemented by LIA_PHON for missing words) on the ESTER 1 French broadcast news corpus, we were able to significantly decrease the Word Error Rate (WER) on segments of speech with proper nouns, without negatively affecting the WER on the rest of the corpus.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01433238
Contributor : Sylvain Meignier <>
Submitted on : Wednesday, March 22, 2017 - 5:18:49 PM
Last modification on : Tuesday, March 28, 2017 - 1:05:28 AM
Document(s) archivé(s) le : Friday, June 23, 2017 - 12:27:55 PM

File

CSL_antoine.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Antoine Laurent, Sylvain Meignier, Paul Deléglise. Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions. Computer Speech and Language, Elsevier, 2014, 28 (4), pp.979-996. ⟨10.1016/j.csl.2014.02.006⟩. ⟨hal-01433238⟩

Share

Metrics

Record views

173

Files downloads

381