Skip to Main content Skip to Navigation
New interface
Conference papers

A scalable framework for joint clustering and synchronizing multi-camera videos

Abstract : This paper describes a method to cluster and synchronize large scale audio-video sequences recorded by multiple users during an event. The proposed method is designed to jointly cluster audio content and synchronize sequences in each cluster to create a multi-view presentation of the event. The method is roughly based on cross-correlation of local audio features. In this paper, three main contributions are presented to obtain a scalable and accurate framework. First, a salient representation of features is used to reduce the computation complexity while maintaining high performance. Second, an intermediate clustering step is introduced to limit the number of comparisons required. Third, a voting approach is proposed to avoid tuning thresholds for cross-correlation. This framework was tested on 164 YouTube concert videos and results demonstrated the efficiency of the method with a correct clustering of 98.8% of the sequences.
Complete list of metadata
Contributor : Alexey Ozerov Connect in order to contact the contributor
Submitted on : Monday, October 7, 2013 - 11:07:19 AM
Last modification on : Tuesday, October 8, 2013 - 2:33:39 PM
Long-term archiving on: : Friday, April 7, 2017 - 7:12:48 AM


Files produced by the author(s)


  • HAL Id : hal-00870381, version 1



Ashish Bagri, Franck Thudor, Alexey Ozerov, Pierre Hellier. A scalable framework for joint clustering and synchronizing multi-camera videos. 21st European Signal Processing Conference (EUSIPCO 2013), Sep 2013, Marrakech, Morocco. ⟨hal-00870381⟩



Record views


Files downloads