Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion

Guillaume Gravier 1 Nathan Souviraà-Labastie 2 Sébastien Campion 1 Frédéric Bimbot 2
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 PANAMA - Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio
IRISA-D5 - SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE, Inria Rennes – Bretagne Atlantique
Abstract : The paper presents a system to create audio thumbnails of spo- ken content, i.e., short audio summaries representative of the entire content, without resorting to a lexical representation. As an alternative to searching for relevant words and phrases in a transcript, unsupervised motif discovery is used to find short, word-like, repeating fragments at the signal level without acous- tic models. The output of the word discovery algorithm is ex- ploited via a maximum motif coverage criterion to generate a thumbnail in an extractive manner. A limited number of relevant segments are chosen within the data so as to include the maxi- mum number of motifs while remaining short enough and intel- ligible. Evaluation is performed on broadcast news reports with a panel of human listeners judging the quality of the thumb- nails. Results indicate that motif-based thumbnails stand be- tween random thumbnails and ASR-based keywords, however still far behind thumbnails and keywords humanly authored.
Document type :
Conference papers
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01026402
Contributor : Guillaume Gravier <>
Submitted on : Monday, July 21, 2014 - 3:30:20 PM
Last modification on : Thursday, November 15, 2018 - 11:58:45 AM
Long-term archiving on : Monday, November 24, 2014 - 9:21:32 PM

File

gravier-is14.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01026402, version 1

Citation

Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot. Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion. Annual Conference of the International Speech Communication Association, Sep 2014, Singapour, Singapore. ⟨hal-01026402⟩

Share

Metrics

Record views

1197

Files downloads

405