A Model You Can Hear: Audio Identification with Playable Prototypes - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

A Model You Can Hear: Audio Identification with Playable Prototypes

Résumé

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

Dates et versions

hal-03794815 , version 1 (03-10-2022)

Identifiants

Citer

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, et al.. A Model You Can Hear: Audio Identification with Playable Prototypes. ISMIR 2022 - 23rd International Society for Music Information Retrieval Conference, Dec 2022, Bengaluru, India. ⟨hal-03794815⟩
52 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More