The ETAPE corpus for the evaluation of speech-based TV content processing in the French language

Guillaume Gravier 1 Gilles Adda 2 Niklas Paulson 3 Matthieu Carré 3 Aude Giraudel 4 Olivier Galibert 5
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 Traitement du Langage parlé
LIMSI - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
Abstract : The paper presents a comprehensive overview of existing data for the evaluation of spoken content processing in a multimedia framework for the French language. We focus on the ETAPE corpus which will be made publicly available by ELDA at the end of 2012, after completion of the evaluation, and recall existing resources resulting from previous evaluation campaigns. The ETAPE corpus consists of 30 hours of TV and radio broadcasts, selected to cover a wide variety of topics and speaking styles, emphasizing spontaneous speech and multiple speaker areas.
Document type :
Conference papers
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00712591
Contributor : Guillaume Gravier <>
Submitted on : Sunday, July 1, 2012 - 9:54:29 PM
Last modification on : Thursday, June 20, 2019 - 4:48:03 PM
Long-term archiving on : Tuesday, October 2, 2012 - 9:30:56 AM

File

final-v2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00712591, version 1

Citation

Guillaume Gravier, Gilles Adda, Niklas Paulson, Matthieu Carré, Aude Giraudel, et al.. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. LREC - Eighth international conference on Language Resources and Evaluation, 2012, Turkey. ⟨hal-00712591⟩

Share

Metrics

Record views

2272

Files downloads

1239