Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion

Guillaume Gravier; Nathan Souviraà-Labastie; Sébastien Campion; Frédéric Bimbot

Communication Dans Un Congrès Année : 2014

Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion

(1) , (2) , (1) , (2)

1
2

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Multimedia content-based indexing

Nathan Souviraà-Labastie

Fonction : Auteur
PersonId : 907835

Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio

Sébastien Campion

Fonction : Auteur
PersonId : 899001

Multimedia content-based indexing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio

Résumé

The paper presents a system to create audio thumbnails of spo- ken content, i.e., short audio summaries representative of the entire content, without resorting to a lexical representation. As an alternative to searching for relevant words and phrases in a transcript, unsupervised motif discovery is used to find short, word-like, repeating fragments at the signal level without acous- tic models. The output of the word discovery algorithm is ex- ploited via a maximum motif coverage criterion to generate a thumbnail in an extractive manner. A limited number of relevant segments are chosen within the data so as to include the maxi- mum number of motifs while remaining short enough and intel- ligible. Evaluation is performed on broadcast news reports with a panel of human listeners judging the quality of the thumb- nails. Results indicate that motif-based thumbnails stand be- tween random thumbnails and ASR-based keywords, however still far behind thumbnails and keywords humanly authored.

Mots clés

spoken content processing audio mining motif discovery summarization thumbnailing

Domaines

Multimédia [cs.MM]

Fichier principal

gravier-is14.pdf (203.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Gravier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01026402

Soumis le : lundi 21 juillet 2014-15:30:20

Dernière modification le : vendredi 24 mars 2023-14:52:59

Archivage à long terme le : lundi 24 novembre 2014-21:21:32

Dates et versions

hal-01026402 , version 1 (21-07-2014)

Identifiants

HAL Id : hal-01026402 , version 1

Citer

Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot. Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion. Annual Conference of the International Speech Communication Association, Sep 2014, Singapour, Singapore. ⟨hal-01026402⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

353 Consultations

297 Téléchargements

Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager