Audio word similarity for clustering with zero resources based on iterative HMM classification

Amélie Royer 1 Guillaume Gravier 1 Vincent Claveau 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : Recent work on zero resource word discovery makes intensive use of audio fragment clustering to find repeating speech patterns. In the absence of acoustic models, the clustering step traditionally relies on dynamic time warping (DTW) to compare two samples and thus suffers from the known limitations of this technique. We propose a new sample comparison method, called 'similarity by iterative classification', that exploits the modeling capacities of hidden Markov models (HMM) with no supervision. The core idea relies on the use of HMMs trained on randomly labeled data and exploits the fact that similar samples are more likely to be classified together by a large number of random classifiers than dissimilar ones. The resulting similarity measure is compared to DTW on two tasks, namely nearest neighbor retrieval and clustering , showing that the generalization capabilities of probabilis-tic machine learning significantly benefit to audio word comparison and overcome many of the limitations of DTW-based comparison.
Type de document :
Communication dans un congrès
International Conference on Acoustics, Speech and Signal Processing, ICASSP, Mar 2016, Shanghai, China. Proceedings of the IEEE 41th International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp.5340 - 5344, 2016, 〈10.1109/ICASSP.2016.7472697〉
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01394757
Contributeur : Vincent Claveau <>
Soumis le : mercredi 9 novembre 2016 - 17:01:34
Dernière modification le : mercredi 29 novembre 2017 - 15:42:45
Document(s) archivé(s) le : mercredi 15 mars 2017 - 04:25:31

Fichier

ICASSP2016.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Amélie Royer, Guillaume Gravier, Vincent Claveau. Audio word similarity for clustering with zero resources based on iterative HMM classification. International Conference on Acoustics, Speech and Signal Processing, ICASSP, Mar 2016, Shanghai, China. Proceedings of the IEEE 41th International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp.5340 - 5344, 2016, 〈10.1109/ICASSP.2016.7472697〉. 〈hal-01394757〉

Partager

Métriques

Consultations de la notice

458

Téléchargements de fichiers

229