Relevance-based quantization of scattering features for unsupervised mining of environmental audio

Vincent Lostanlen; Grégoire Lafay; Joakim Andén; Mathieu Lagrange

doi:10.1186/s13636-018-0138-4

Article Dans Une Revue EURASIP Journal on Audio, Speech, and Music Processing Année : 2018

Relevance-based quantization of scattering features for unsupervised mining of environmental audio

(1) , (2) , (3) , (2)

1
2
3

Vincent Lostanlen

Fonction : Auteur
PersonId : 749246
IdHAL : lostanlen
ORCID : 0000-0003-0580-1651
IdRef : 203022769

NYU

Grégoire Lafay

Fonction : Auteur
PersonId : 718
IdHAL : gregoirelafay

Laboratoire des Sciences du Numérique de Nantes

Joakim Andén

Fonction : Auteur

Flatiron Institute

Mathieu Lagrange

Fonction : Auteur
PersonId : 4329
IdHAL : mathieu-lagrange

Laboratoire des Sciences du Numérique de Nantes

Résumé

The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring analysis. To decide which parts to inspect further, we need tools that automatically mine the data, identifying recurring patterns and isolated events. This requires a similarity measure for acoustic scenes that does not impose strong assumptions on the data. The state of the art in audio similarity measurement is the "bag-of-frames" approach, which models a recording using summary statistics of short-term audio descriptors, such as mel-frequency cepstral coefficients (MFCCs). They successfully characterise static scenes with little variability in auditory content , but cannot accurately capture scenes with a few salient events superimposed over static background. To overcome this issue, we propose a two-scale representation which describes a recording using clusters of scattering coefficients. The scattering coefficients capture short-scale structure, while the cluster model captures longer time scales, allowing for more accurate characterization of sparse events. Evaluation within the acoustic scene similarity framework demonstrates the interest of the proposed approach. Keywords unsupervised learning · data mining · acoustic signal processing · wavelet transforms · audio databases · content-based retrieval · nearest neighbor searches · acoustic sensors · environmental sensors.

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Multimédia [cs.MM] Traitement du signal et de l'image [eess.SP]

Fichier principal

lostanlenRelevanceScattering.pdf (1.33 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Lagrange : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01887403

Soumis le : jeudi 10 janvier 2019-14:39:54

Dernière modification le : samedi 11 novembre 2023-20:50:03

Archivage à long terme le : jeudi 11 avril 2019-15:50:27

Dates et versions

hal-01887403 , version 1 (10-01-2019)

Identifiants

HAL Id : hal-01887403 , version 1
DOI : 10.1186/s13636-018-0138-4

Citer

Vincent Lostanlen, Grégoire Lafay, Joakim Andén, Mathieu Lagrange. Relevance-based quantization of scattering features for unsupervised mining of environmental audio. EURASIP Journal on Audio, Speech, and Music Processing, 2018, 2018 (1), ⟨10.1186/s13636-018-0138-4⟩. ⟨hal-01887403⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES INSTITUT-TELECOM CNRS EC-NANTES UNAM LS2N LS2N-SIMS ANR NANTES-UNIVERSITE

100 Consultations

92 Téléchargements

Relevance-based quantization of scattering features for unsupervised mining of environmental audio

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager