Semi-supervised triplet loss based learning of ambient audio embeddings

Nicolas Turpault 1 Romain Serizel 1 Emmanuel Vincent 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Deep neural networks are particularly useful to learn relevant representations from data. Recent studies have demonstrated the potential of unsupervised representation learning for ambient sound analysis using various flavors of the triplet loss. They have compared this approach to supervised learning. However, in real situations, it is common to have a small labeled dataset and a large unlabeled one. In this paper, we combine unsupervised and supervised triplet loss based learning into a semi-supervised representation learning approach. We propose two flavors of this approach, whereby the positive samples for those triplets whose anchors are unlabeled are obtained either by applying a transformation to the anchor, or by selecting the nearest sample in the training set. We compare our approach to supervised and unsupervised representation learning as well as the ratio between the amount of labeled and unlabeled data. We evaluate all the above approaches on an audio tagging task using the DCASE 2018 Task 4 dataset, and we show the impact of this ratio on the tagging performance.
Type de document :
Communication dans un congrès
ICASSP, May 2019, Brighton, United Kingdom
Liste complète des métadonnées
Contributeur : Nicolas Turpault <>
Soumis le : vendredi 22 février 2019 - 11:26:01
Dernière modification le : jeudi 21 mars 2019 - 14:20:42


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-02025824, version 1


Nicolas Turpault, Romain Serizel, Emmanuel Vincent. Semi-supervised triplet loss based learning of ambient audio embeddings. ICASSP, May 2019, Brighton, United Kingdom. 〈hal-02025824〉



Consultations de la notice


Téléchargements de fichiers