Audio-video feature correlation: faces and speech - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 1999

Audio-video feature correlation: faces and speech

Résumé

This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.
Fichier non déposé

Dates et versions

hal-01574091 , version 1 (11-08-2017)

Identifiants

Citer

Gwenaël Durand, Claude Montacié, Marie-Josée Caraty, Pascal Faudemay. Audio-video feature correlation: faces and speech. Multimedia Storage and Archiving Systems IV, Sep 1999, Boston, MA, United States. pp.102-112, ⟨10.1117/12.360415⟩. ⟨hal-01574091⟩
41 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More