Video Indexing Based on Image and Sound - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 1997

Video Indexing Based on Image and Sound

Résumé

Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel. We first present a multi-channel and multi-modal query interface, to query sound, image and script through 'pull' and 'push' queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech. We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation as experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a carton movie, 'Ivanhoe'. For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost architectures or networks of workstations.
Fichier non déposé

Dates et versions

hal-01649070 , version 1 (27-11-2017)

Identifiants

Citer

Pascal Faudemay, Claude Montacié, Marie-Josée Caraty. Video Indexing Based on Image and Sound. International Conference on Multimedia Storage and Archivig System, Nov 1997, Dallas, TX, United States. pp.57-69, ⟨10.1117/12.290365⟩. ⟨hal-01649070⟩
39 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More