Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Computer Speech and Language Année : 2014

Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions

Résumé

Accurate phonetic transcription of proper nouns can be an important resource for commercial applications that embed speech technologies, such as audio indexing and vocal phone directory lookup. However, an accurate phonetic transcription is more difficult to obtain for proper nouns than for regular words. Indeed, phonetic transcription of a proper noun depends on both the origin of the speaker pronouncing it and the origin of the proper noun itself. This work proposes a method that allows the extraction of phonetic transcriptions of proper nouns using actual utterances of those proper nouns, thus yielding transcriptions based on practical use instead of mere pronunciation rules. The proposed method consists in a process that first extracts phonetic transcriptions, and then iteratively filters them. In order to initialize the process, an alignment dictionary is used to detect word boundaries. A rule-based grapheme-to-phoneme generator (LIA_PHON), a knowledge-based approach (JSM), and a Statistical Machine Translation based system were evaluated for this alignment. As a result, compared to our reference dictionary (BDLEX supplemented by LIA_PHON for missing words) on the ESTER 1 French broadcast news corpus, we were able to significantly decrease the Word Error Rate (WER) on segments of speech with proper nouns, without negatively affecting the WER on the rest of the corpus.
Fichier principal
Vignette du fichier
CSL_antoine.pdf (2.09 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01433238 , version 1 (22-03-2017)

Identifiants

Citer

Antoine Laurent, Sylvain Meignier, Paul Deléglise. Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions. Computer Speech and Language, 2014, 28 (4), pp.979-996. ⟨10.1016/j.csl.2014.02.006⟩. ⟨hal-01433238⟩
141 Consultations
403 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More