Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation

David Doukhan; Jean Carrive

Communication Dans Un Congrès Année : 2017

Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation

(1) , (1)

David Doukhan

Fonction : Auteur
PersonId : 1006987

Institut National de l'Audiovisuel

Jean Carrive

Fonction : Auteur

Institut National de l'Audiovisuel

Résumé

A convolutional neural network architecture, trained with a semi-supervised strategy, is proposed for speech/music classification (SMC) and segmentation (SMS). It is compared to baseline machine learning algorithms on three SMC corpora and demonstrates superior performances, associated to perfect media-level speech recall scores. Evaluation corpora include speech-over-music segments with durations varying between 3 and 30 seconds. Early SMS results are presented. Segmentation errors are associated to musical genres not covered in the training database, and/or with close to speech acoustic properties. These experiments are aimed to help the design of novel speech/music annotated resources and evaluation protocols, suited to TV and radio stream indexation.

Mots clés

Speech/music discrimination Audio segmentation Convolutional Neural Networks Music Information Retrieval Multimedia Indexation

Domaines

Apprentissage [cs.LG] Réseau de neurones [cs.NE] Son [cs.SD]

Fichier principal

2017_mmedia_doukhan.pdf (3.28 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

David Doukhan : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01514228

Soumis le : lundi 1 mai 2017-22:13:25

Dernière modification le : mercredi 9 octobre 2019-11:44:04

Archivage à long terme le : mercredi 2 août 2017-12:17:00

Dates et versions

hal-01514228 , version 1 (01-05-2017)

Identifiants

HAL Id : hal-01514228 , version 1

Citer

David Doukhan, Jean Carrive. Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation. The Ninth International Conferences on Advances in Multimedia (MMEDIA 2017) : , IARIA, Apr 2017, Venise, Italy. ⟨hal-01514228⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

310 Consultations

334 Téléchargements

Investigating the Use of Semi-Supervised Convolutional Neural Network Models for Speech/Music Classification and Segmentation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager