Towards phonetic interpretability in deep learning applied to voice comparison - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Towards phonetic interpretability in deep learning applied to voice comparison

Résumé

A deep convolutional neural network was trained to classify 45 speakers based on spectrograms of their productions of the French vowel /ɑ̃/ Although the model achieved fairly high accuracy – over 85 % – our primary focus here was phonetic interpretability rather than sheer performance. In order to better understand what kind of representations were learned by the model, i) several versions of the model were trained and tested with low-pass filtered spectrograms with a varying cut-off frequency and ii) classification was also performed with masked frequency bands. The resulting decline in accuracy was utilized to spot relevant frequencies for speaker classification and voice comparison, and to produce phonetically interpretable visualizations.
Fichier principal
Vignette du fichier
Ferragne_Gendrot_Pellegrini.pdf (312.99 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

halshs-02412948 , version 1 (16-12-2019)

Identifiants

  • HAL Id : halshs-02412948 , version 1

Citer

Emmanuel Ferragne, Cédric Gendrot, Thomas Pellegrini. Towards phonetic interpretability in deep learning applied to voice comparison. ICPhS, Aug 2019, Melbourne, Australia. pp.ISBN 978-0-646-80069-1. ⟨halshs-02412948⟩
227 Consultations
124 Téléchargements

Partager

Gmail Facebook X LinkedIn More