Skip to Main content Skip to Navigation
Conference papers

DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification

Abstract : Classification of transients is a difficult task. In bioacoustics, almost all studies are still done with human labeling. In passive acoustic monitoring (PAM), the data to label are made up from months of continuous recordings with multiple recording stations and the time required to label everything with human labeling is longer than the next recording session will take to produce new data, even with multiple experts. To help lay a foundation for the emergence of automatic labeling of marine mammal transients, we built a dataset using weak labels from a 3TB dataset of marine mammal transients of DCLDE 2018. The DCLDE dataset was made for a click classification challenge. The new dataset has strong labels and opened a new challenge, DOCC10, whose baseline is also described in this paper. The accuracy of 71% of the baseline is already good enough to curate the large dataset, leaving only some regions of interest still to be expertised. But this is far from perfect, and there remains space for future improvement, or challenging alternative techniques. A smaller version of DOCC10 named DOCC7 is also presented.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download
Contributor : Maxence Ferrari <>
Submitted on : Friday, June 12, 2020 - 11:35:43 AM
Last modification on : Friday, June 26, 2020 - 3:15:22 AM


Files produced by the author(s)


  • HAL Id : hal-02866091, version 1


Maxence Ferrari, Hervé Glotin, Ricard Marxer, Mark Asch. DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification. IJCNN, Jul 2020, Glasgow, United Kingdom. ⟨hal-02866091⟩



Record views


Files downloads