Caractérisation et détection de parole spontanée dans de larges collections de documents audio

Vincent Jousse; Yannick Estève; Frédéric Béchet; Thierry Bazillon; Georges Linares

Communication Dans Un Congrès Année : 2008

Caractérisation et détection de parole spontanée dans de larges collections de documents audio

(1) , (1) , (2) , (1) , (2)

1
2

Vincent Jousse

Fonction : Auteur correspondant
PersonId : 998326

Connectez-vous pour contacter l'auteur

Laboratoire d'Informatique de l'Université du Maine

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire d'Informatique de l'Université du Maine

Frédéric Béchet

Fonction : Auteur
PersonId : 12253
IdHAL : frederic-bechet
IdRef : 070531730

Laboratoire Informatique d'Avignon

Thierry Bazillon

Fonction : Auteur
PersonId : 764399
IdRef : 152797750

Laboratoire d'Informatique de l'Université du Maine

Georges Linares

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Résumé

Processing spontaneous speech is one of the many challenges that Automatic Speech Recognition (ASR) systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. Disfluencies are of course very good indicators of unprepared speech, however they are not the only ones : ungrammaticality and language register are also important as well as prosodic patterns. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases. To better define this notion of unprepared speech, a set of speech segments representing an 11 hour corpus (French Broadcast News) has been manually labelled according to a level of spontaneity. We present an evaluation of our features on this corpus and describe the correlation between the Word-Error-Rate obtained by a state-of-the-art ASR decoder on this BN corpus and the level of spontaneity.

Mots clés

spontaneous speech characterization spontaneous speech detection automatic speech re-cognition spontaneous speech characterization

Domaines

Informatique [cs]

bibliothèque Universitaire Déposants HAL-Avignon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01317613

Soumis le : mercredi 18 mai 2016-16:12:09

Dernière modification le : vendredi 24 mars 2023-14:53:02

Dates et versions

hal-01317613 , version 1 (18-05-2016)

Identifiants

HAL Id : hal-01317613 , version 1

Citer

Vincent Jousse, Yannick Estève, Frédéric Béchet, Thierry Bazillon, Georges Linares. Caractérisation et détection de parole spontanée dans de larges collections de documents audio. JEP, Jun 2008, Avignon, France. ⟨hal-01317613⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON CNRS UNIV-LEMANS LIUM LIUM-LST LIA

225 Consultations

0 Téléchargements

Caractérisation et détection de parole spontanée dans de larges collections de documents audio

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager