Optimizing the coverage of a speech database through a selection of representative speaker recordings

Sacha Krstulovic 1 Frédéric Bimbot 1 Olivier Boëffard 2 Delphine Charlet 3 Dominique Fohr 4 Odile Mella 4
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 CORDIAL - Human-machine spoken dialogue
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes, ENSSAT - École Nationale Supérieure des Sciences Appliquées et de Technologie
4 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In the context of the Neologos French speech database creation project, we have defined a general methodology for the selection of representative speaker recordings. The selection aims at insuring a good coverage in terms of speaker variability while limiting the number of recorded speakers. This makes the resulting database both more adapted to the development of recently proposed multi-model methods and cheaper to collect. The presented methodology proposes to operate a selection by optimizing a quality criterion defined in a variety of speaker similarity modeling frameworks. The selection can be operated and validated with respect to a unique similarity criterion, using classical clustering methods such as Hierarchical or K-Medians clustering, or it can be operated and validated across several speaker similarity criteria, thanks to a newly developed clustering method called Focal Speakers Selection. In this framework, four different speaker similarity criteria are tested, and three different speaker clustering algorithms are compared. Results pertaining to the collection of the Neologos database are also discussed.
Document type :
Journal articles
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00110509
Contributor : Dominique Fohr <>
Submitted on : Monday, October 30, 2006 - 2:03:06 PM
Last modification on : Friday, November 16, 2018 - 1:25:25 AM

Identifiers

  • HAL Id : hal-00110509, version 1

Citation

Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Delphine Charlet, Dominique Fohr, et al.. Optimizing the coverage of a speech database through a selection of representative speaker recordings. Speech Communication, Elsevier : North-Holland, 2006, 48, pp.1319-1348. ⟨hal-00110509⟩

Share

Metrics

Record views

466