Audio-video feature correlation: faces and speech

Gwenaël Durand; Claude Montacié; Marie-Josée Caraty; Pascal Faudemay

doi:10.1117/12.360415

Communication Dans Un Congrès Année : 1999

Audio-video feature correlation: faces and speech

, (1) , (1) , (2)

1
2

Gwenaël Durand

Fonction : Auteur

Claude Montacié

Fonction : Auteur
PersonId : 1014617

Apprentissage et Acquisition des connaissances

Marie-Josée Caraty

Fonction : Auteur

Apprentissage et Acquisition des connaissances

Pascal Faudemay

Fonction : Auteur
PersonId : 1014580

Architecture des Systèmes intégrés et Micro électronique

Résumé

This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01574091

Soumis le : vendredi 11 août 2017-15:32:26

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01574091 , version 1 (11-08-2017)

Identifiants

HAL Id : hal-01574091 , version 1
DOI : 10.1117/12.360415

Citer

Gwenaël Durand, Claude Montacié, Marie-Josée Caraty, Pascal Faudemay. Audio-video feature correlation: faces and speech. Multimedia Storage and Archiving Systems IV, Sep 1999, Boston, MA, United States. pp.102-112, ⟨10.1117/12.360415⟩. ⟨hal-01574091⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

41 Consultations

0 Téléchargements

Audio-video feature correlation: faces and speech

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager