Histogram of gradients of Time-Frequency Representations for Audio scene detection

Alain Rakotomamonjy; Gilles Gasso

doi:10.1109/TASLP.2014.2375575

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2015

Histogram of gradients of Time-Frequency Representations for Audio scene detection

(1) , (2)

1
2

Alain Rakotomamonjy

Fonction : Auteur
PersonId : 174806
IdHAL : arakotomamonjy
ORCID : 0000-0002-4210-7792
IdRef : 083002138

Equipe Apprentissage

Gilles Gasso

Fonction : Auteur
PersonId : 178750
IdHAL : ggasso
IdRef : 151524378

Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes

Résumé

This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene. Contrarily to classical audio features like MFCC, we make the hypothesis that histogram of gradients are able to encode some relevant informations in a time-frequency {representation:} namely, the local direction of variation (in time and frequency) of the signal spectral power. In addition, in order to gain more invariance and robustness, histogram of gradients are locally pooled. We have evaluated the relevance of {the novel feature} by comparing its performances with state-of-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves $19$ classes and contains about $900$ minutes of audio scene recording. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that our HOG-based features outperform its competitors

Mots clés

MFCC audio scene histogram of gradient support vector machines Time-Frequency Representation

Domaines

Apprentissage [cs.LG] Informatique Traitement du signal et de l'image [eess.SP]

Fichier principal

hogforcasa.pdf (5 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Alain Rakotomamonjy : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00951990

Soumis le : lundi 3 août 2015-19:50:00

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : mercredi 4 novembre 2015-10:32:18

Dates et versions

hal-00951990 , version 1 (26-02-2014)

hal-00951990 , version 2 (06-03-2014)

hal-00951990 , version 3 (29-08-2014)

hal-00951990 , version 4 (11-06-2015)

hal-00951990 , version 5 (02-07-2015)

hal-00951990 , version 6 (03-08-2015)

Identifiants

HAL Id : hal-00951990 , version 6
ARXIV : 1508.04909
DOI : 10.1109/TASLP.2014.2375575

Citer

Alain Rakotomamonjy, Gilles Gasso. Histogram of gradients of Time-Frequency Representations for Audio scene detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, 23 (1), pp.142-153. ⟨10.1109/TASLP.2014.2375575⟩. ⟨hal-00951990v6⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSA-ROUEN LITIS COMUE-NORMANDIE LMI-ROUEN UNIROUEN UNILEHAVRE INSA-GROUPE

297 Consultations

1233 Téléchargements

Histogram of gradients of Time-Frequency Representations for Audio scene detection

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager