Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Sanjeel Parekh
Alexey Ozerov
  • Fonction : Auteur
  • PersonId : 882775
Ngoc Q K Duong
  • Fonction : Auteur
  • PersonId : 864978
Patrick Pérez
  • Fonction : Auteur
  • PersonId : 1022281

Résumé

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.
Fichier principal
Vignette du fichier
manuscript.pdf (731.05 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01626389 , version 1 (30-10-2017)
hal-01626389 , version 2 (17-05-2018)
hal-01626389 , version 3 (06-11-2018)

Identifiants

  • HAL Id : hal-01626389 , version 2

Citer

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Q K Duong, Patrick Pérez, et al.. Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision. 2018. ⟨hal-01626389v2⟩
832 Consultations
2230 Téléchargements

Partager

Gmail Facebook X LinkedIn More