Inferring phonemic classes from CNN activation maps using clustering techniques

Abstract : Today's state-of-art in speech recognition involves deep neu-ral networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neu-ral networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the different layers of a CNN comprised of two convolution and sub-sampling layers followed by three dense layers. Our goal was to get insights into phone separability and phonemic categories inferred by the network, and how they vary according to the successive layers. Two directions were explored with both linear and non-linear clustering techniques. First, we imposed a number of 33 classes equal to the number of context-independent phone models for French, in order to assess the phoneme separability power of the different layers. As expected, we observed that this power increases with the layer depth in the network: from 34% to 74% in F-measure from the first convolution to the last dense layers, when using spectral clustering. Second, optimal numbers of classes were automatically inferred through inter-and intra-cluster measure criteria. We analyze these classes in terms of standard French phonological features.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01474886
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Thursday, February 23, 2017 - 11:43:14 AM
Last modification on : Friday, June 14, 2019 - 6:31:14 PM
Long-term archiving on : Wednesday, May 24, 2017 - 1:29:59 PM

File

pellegrini_17161.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01474886, version 1
  • OATAO : 17161

Collections

Citation

Thomas Pellegrini, Sandrine Mouysset. Inferring phonemic classes from CNN activation maps using clustering techniques. Annual conference Interspeech (INTERSPEECH 2016), Sep 2016, San Francisco, United States. pp. 1290-1294. ⟨hal-01474886⟩

Share

Metrics

Record views

106

Files downloads

336