End to end raw audio deep learning of transients, application to bioacoustics - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

End to end raw audio deep learning of transients, application to bioacoustics

Résumé

In this paper, we propose a raw audio deep learning of clicks, building specific convolution filters in high dimension to elaborate complex TF representation. The CNN has 12 layers for several thousands of audio bins in inputs, and a dozen of output classes. We test this model on the international DCLDE challenge of 3 To of clicks (http://sabiod.org/DCLDE). This challenge was open in 2018, but no team answered before. At our knowledge, our model is the first raw audio click classifier with nearly 70% accurray on a dozen of classes. We discuss on the class confusions of the model and possible enhancement using data augmentation and regulation.
Fichier principal
Vignette du fichier
001096.pdf (128.9 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03230842 , version 1 (25-05-2021)

Identifiants

Citer

Maxence Ferrari, Hervé Glotin, Ricard Marxer. End to end raw audio deep learning of transients, application to bioacoustics. e-Forum Acusticum 2020, Dec 2020, Lyon, France. pp.3245-3247, ⟨10.48465/fa.2020.1096⟩. ⟨hal-03230842⟩
88 Consultations
49 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More