Sampling strategies in Siamese Networks for unsupervised speech representation learning

Recent studies have investigated siamese network architectures for learning invariant speech representations using same-different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the distribution of speakers and the proportions of same and different pairs of words significantly impact the performance of the network. In particular, we show that word frequency compression improves learning across a large range of variations in number of training pairs. This effect does not apply to the same extent to the fully unsupervised setting, where the pairs of same-different words are obtained by spoken term discovery. We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.

Mots clés

Siamese network Language acquisition Sampling Zipf’s law ABX Zero resource speech technology Index Terms: language acquisition speech recognition sam-pling Zipf's law weakly supervised learning unsupervised learning speech embeddings zero re-source speech technology

Domaines

Informatique et langage [cs.CL] Autres [stat.ML] Sciences cognitives Linguistique

Fichier principal

Riad_DKZSD_2018_Sampling_strategies_for_unsup_siamese.Interspeech.pdf (244.56 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Dupoux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01888725

Soumis le : vendredi 7 décembre 2018-14:32:21

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : vendredi 8 mars 2019-15:02:07

Dates et versions

hal-01888725 , version 1 (07-12-2018)

Identifiants

HAL Id : hal-01888725 , version 1
ARXIV : 1804.11297

Citer

Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, et al.. Sampling strategies in Siamese Networks for unsupervised speech representation learning. Interspeech 2018, Sep 2018, Hyderabad, India. ⟨hal-01888725⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL

122 Consultations

167 Téléchargements