Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Sanjeel Parekh
Alexey Ozerov
  • Fonction : Auteur
  • PersonId : 882775
Ngoc Q. K. Duong
  • Fonction : Auteur
  • PersonId : 946470
Patrick Pérez
  • Fonction : Auteur
  • PersonId : 1022281

Résumé

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.
Fichier principal
Vignette du fichier
weak_nmf_prop.pdf (697.19 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01914532 , version 1 (07-11-2018)

Identifiants

Citer

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Q. K. Duong, Patrick Pérez, et al.. Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision. 2018. ⟨hal-01914532⟩
96 Consultations
190 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More