A Model You Can Hear: Audio Identification with Playable Prototypes

Romain Loiseau; Baptiste Bouvier; Yann Teytaut; Elliot Vincent; Mathieu Aubry; Loic Landrieu

Communication Dans Un Congrès Année : 2022

A Model You Can Hear: Audio Identification with Playable Prototypes

(1, 2) , (3) , (3) , (4, 1) , (1) , (2)

1
2
3
4

Romain Loiseau

Fonction : Auteur
PersonId : 1120806

Laboratoire d'Informatique Gaspard-Monge

Laboratoire sciences et technologies de l'information géographique

Baptiste Bouvier

Fonction : Auteur

Sciences et Technologies de la Musique et du Son

Yann Teytaut

Fonction : Auteur

Sciences et Technologies de la Musique et du Son

Elliot Vincent

Fonction : Auteur
PersonId : 753409
IdHAL : elliot-vincent
ORCID : 0009-0001-1713-2590

Models of visual object recognition and scene understanding

Laboratoire d'Informatique Gaspard-Monge

Mathieu Aubry

Fonction : Auteur
PersonId : 945627
IdHAL : mathieu-aubry

Laboratoire d'Informatique Gaspard-Monge

Loic Landrieu

Fonction : Auteur

Laboratoire sciences et technologies de l'information géographique

Résumé

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

Domaines

Son [cs.SD] Vision par ordinateur et reconnaissance de formes [cs.CV]

Romain Loiseau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03794815

Soumis le : lundi 3 octobre 2022-15:19:48

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-03794815 , version 1 (03-10-2022)

Identifiants

HAL Id : hal-03794815 , version 1
ARXIV : 2208.03311

Citer

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, et al.. A Model You Can Hear: Audio Identification with Playable Prototypes. ISMIR 2022 - 23rd International Society for Music Information Retrieval Conference, Dec 2022, Bengaluru, India. ⟨hal-03794815⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC CNRS INRIA LIGM_A3SI PARISTECH IRCAM LIGM STMS INRIA2 PSL SORBONNE-UNIVERSITE SU-SCIENCES IGN-ENSG UNIV-EIFFEL JSE2024

52 Consultations

0 Téléchargements

A Model You Can Hear: Audio Identification with Playable Prototypes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager