Can automatic speech transcripts be used for large scale TV stream description and structuring?

Camille Guinaudeau 1 Guillaume Gravier 1 Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The increasing quantity of TV material requires methods to help users navigate such data streams. Automatically associating a short textual description with each program in a stream, is a first stage to navigating or structuring tasks. Speech contained in TV broadcasts--accessible by means of automatic speech recognition systems in the absence of closed caption--is a highly valuable semantic clue that might be used to link existing textual description such as program guides, with video segments corresponding to program. However, high word error rates are to be expected on some programs, likely to jeopardize the usefulness of transcripts. The goal of this article is to determine to what extent automatic transcripts of TV streams, for various types of programs, can be used for structuring or navigating tasks. To this end, word-based and phonetic-based automatic association between video segments and program descriptions is used as a case study. We show that descriptions from a program guide can be associated with video segments with an accuracy of up to 65% and provide a valuable description to validate existing program labels. Such associations constitute a first stage for structuring task as they enable video segment textual characterization.
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download
Contributor : Pascale Sébillot <>
Submitted on : Thursday, December 6, 2012 - 2:56:18 PM
Last modification on : Friday, November 16, 2018 - 1:22:18 AM
Long-term archiving on : Thursday, March 7, 2013 - 5:00:15 AM


Files produced by the author(s)


  • HAL Id : hal-00762125, version 1


Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot. Can automatic speech transcripts be used for large scale TV stream description and structuring?. First International Workshop on Content-Based Audio/Video Analysis for Novel TV Services, CBTV'09, Dec 2009, San Diego, CA, United States. pp.489-494. ⟨hal-00762125⟩



Record views


Files downloads