Unsupervised mining of multiple audiovisually consistent clusters for video structure analysis

Anh-Phuong Ta 1 Guillaume Gravier 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We address the problem of detecting multiple audiovisual events related to the edit structure of a video by incorporating an unsupervised cluster analysis technique into a cluster selection method designed to measure coherence between audio and visual segments. First, mutual information measure is used to select audio-visually consistent clusters from two dendrograms representing hierarchical clustering results respectively for the audio and visual modalities. A cluster analysis technique is then applied to define events from the audio-visual (AV) clusters with segments co-occurring frequently. Candidate events are then characterized by groups of AV clusters from which models are built by automatically selecting positive and negative examples. Experiments on the standard Canal9 data set demonstrates that our method is capable of discovering multiple audiovisual events in a totally unsupervised manner.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

Contributor : Guillaume Gravier <>
Submitted on : Wednesday, July 18, 2012 - 4:17:42 PM
Last modification on : Friday, November 16, 2018 - 1:23:39 AM
Long-term archiving on : Friday, October 19, 2012 - 3:00:43 AM


Files produced by the author(s)


  • HAL Id : hal-00718985, version 1


Anh-Phuong Ta, Guillaume Gravier. Unsupervised mining of multiple audiovisually consistent clusters for video structure analysis. ICME - International Conference on Multimedia and Exhibition, 2012, Australia. ⟨hal-00718985⟩



Record views


Files downloads