DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification

Résumé

Classification of transients is a difficult task. In bioacoustics, almost all studies are still done with human labeling. In passive acoustic monitoring (PAM), the data to label are made up from months of continuous recordings with multiple recording stations and the time required to label everything with human labeling is longer than the next recording session will take to produce new data, even with multiple experts. To help lay a foundation for the emergence of automatic labeling of marine mammal transients, we built a dataset using weak labels from a 3TB dataset of marine mammal transients of DCLDE 2018. The DCLDE dataset was made for a click classification challenge. The new dataset has strong labels and opened a new challenge, DOCC10, whose baseline is also described in this paper. The accuracy of 71% of the baseline is already good enough to curate the large dataset, leaving only some regions of interest still to be expertised. But this is far from perfect, and there remains space for future improvement, or challenging alternative techniques. A smaller version of DOCC10 named DOCC7 is also presented.
Fichier principal
Vignette du fichier
PID6461711.pdf (1.56 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02866091 , version 1 (12-06-2020)

Identifiants

  • HAL Id : hal-02866091 , version 1

Citer

Maxence Ferrari, Hervé Glotin, Ricard Marxer, Mark Asch. DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification. IJCNN, Jul 2020, Glasgow, United Kingdom. ⟨hal-02866091⟩
291 Consultations
587 Téléchargements

Partager

Gmail Facebook X LinkedIn More