Spoken WordCloud: Clustering Recurrent Patterns in Speech

Abstract : The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create the summary. However, this approach might not be suitable with adverse acoustic conditions or languages with limited training resources. In order to address these limitations, we propose in this paper an automatic speech summarization method that is based on the automatic discovery of patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions in the recording. This approach allows us to build what we call a "Spoken WordCloud" because of its similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.
Type de document :
Communication dans un congrès
International Workshop on Content-Based Multimedia Indexing, Jun 2011, Madrid, Spain. 2011
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00582799
Contributeur : Rémi Flamary <>
Soumis le : lundi 4 avril 2011 - 10:41:42
Dernière modification le : mardi 3 octobre 2017 - 14:52:10
Document(s) archivé(s) le : jeudi 8 novembre 2012 - 13:15:28

Fichier

flamary_cmbi2011.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00582799, version 1

Collections

Citation

Rémi Flamary, Xavier Anguera, Nuria Oliver. Spoken WordCloud: Clustering Recurrent Patterns in Speech. International Workshop on Content-Based Multimedia Indexing, Jun 2011, Madrid, Spain. 2011. 〈hal-00582799〉

Partager

Métriques

Consultations de
la notice

157

Téléchargements du document

265