Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Sanjeel Parekh; Alexey Ozerov; Slim Essid; Ngoc Q. K. Duong; Patrick Pérez; Gael Richard

Pré-Publication, Document De Travail Année : 2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

(1) , (1) , (2) , (1) , (3) , (2)

1
2
3

Sanjeel Parekh

Fonction : Auteur
PersonId : 743443
IdHAL : sanjeel-parekh
IdRef : 235511277

Technicolor R & I [Cesson Sévigné]

Alexey Ozerov

Fonction : Auteur
PersonId : 882775

Technicolor R & I [Cesson Sévigné]

Slim Essid

Fonction : Auteur
PersonId : 181234
IdHAL : slimessid
ORCID : 0000-0002-0028-327X
IdRef : 11025130X

Laboratoire Traitement et Communication de l'Information

Ngoc Q. K. Duong

Fonction : Auteur
PersonId : 946470

Technicolor R & I [Cesson Sévigné]

Patrick Pérez

Fonction : Auteur
PersonId : 1022281

Valeo.ai

Gael Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Laboratoire Traitement et Communication de l'Information

Résumé

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.

Mots clés

audio-visual event detection source separation non-negative matrix factorization multiple instance learning Index Terms-Audio-visual event detection

Domaines

Traitement du signal et de l'image [eess.SP] Acoustique [physics.class-ph] Vision par ordinateur et reconnaissance de formes [cs.CV] Réseau de neurones [cs.NE]

Fichier principal

weak_nmf_prop.pdf (697.19 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexey Ozerov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01914532

Soumis le : mercredi 7 novembre 2018-00:29:24

Dernière modification le : jeudi 1 février 2024-10:05:34

Archivage à long terme le : vendredi 8 février 2019-12:28:05

Dates et versions

hal-01914532 , version 1 (07-11-2018)

Identifiants

HAL Id : hal-01914532 , version 1
ARXIV : 1811.04000

Citer

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Q. K. Duong, Patrick Pérez, et al.. Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision. 2018. ⟨hal-01914532⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS IRISA PARISTECH UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES LTCI IDS S2A UR1-MATH-NUM

96 Consultations

190 Téléchargements

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager