Audio-video feature correlation: faces and speech

Abstract : This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01574091
Contributor : Lip6 Publications <>
Submitted on : Friday, August 11, 2017 - 3:32:26 PM
Last modification on : Thursday, March 21, 2019 - 1:00:49 PM

Identifiers

Citation

Gwenaël Durand, Claude Montacié, Marie-Josée Caraty, Pascal Faudemay. Audio-video feature correlation: faces and speech. Multimedia Storage and Archiving Systems IV, Sep 1999, Boston, MA, United States. pp.102-112, ⟨10.1117/12.360415⟩. ⟨hal-01574091⟩

Share

Metrics

Record views

84