Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

Benjamin Lecouteux; Georges Linares; Stanislas Oger

Article Dans Une Revue Computer Speech and Language Année : 2012

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

(1) , (2) , (2)

1
2

Benjamin Lecouteux

Fonction : Auteur
PersonId : 7847
IdHAL : benjamin-lecouteux
ORCID : 0000-0003-3000-6190
IdRef : 135355060

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Georges Linares

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Stanislas Oger

Fonction : Auteur
PersonId : 770872
IdRef : 176527176

Laboratoire Informatique d'Avignon

Résumé

The training of state-of-the-art automatic speech recognition (ASR) systems requires huge relevant training corpora. The cost of such databases is high and remains a major limitation for the development of speech-enabled applications in particular contexts (e.g. low-density languages, or specialized domains). On the other hand, a large amount of data can be found in news prompts, movie subtitles or scripts, etc. The use of such data as training corpus could provide a low-cost solution to the acoustic model estimation problem. Unfortunately, prior transcripts are seldom exact with respect to the content of the speech signal, and suffer from a lack of temporal information. This paper tackles the issue of prompt-based speech corpora improvement, by addressing the problems mentioned above. We propose a method allowing to locate accurate transcript segments in speech signals and automatically correct errors or lack of transcript surrounding these segments. This method relies on a new decoding strategy where the search algorithm is driven by the imperfect transcription of the input utterances. The experiments are conducted on the French language, by using the ESTER database and a set of records (and associated prompts) from RTBF (Radio Télévision Belge Francophone). The results demonstrate the effectiveness of the proposed approach, in terms of both error correction and text-to-speech alignment.

Mots clés

Speech processing acoustic model training text-to-speech alignment

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

LowCostCorpus.pdf (659.99 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benjamin Lecouteux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00953675

Soumis le : jeudi 9 novembre 2017-09:36:06

Dernière modification le : jeudi 4 avril 2024-18:26:21

Archivage à long terme le : samedi 10 février 2018-12:32:14

Dates et versions

hal-00953675 , version 1 (09-11-2017)

Identifiants

HAL Id : hal-00953675 , version 1

Citer

Benjamin Lecouteux, Georges Linares, Stanislas Oger. Integrating imperfect transcripts into speech recognition systems for building high-quality corpora. Computer Speech and Language, 2012, 26 (2), pp.67 - 89. ⟨hal-00953675⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON UGA CNRS LIG LIG_TDCGE LIG_TDCGE_GETALP LIA LIG_SIDCH

225 Consultations

564 Téléchargements

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager