Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation - Archive ouverte HAL Access content directly
Journal Articles Computer Speech and Language Year : 2012

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

Camille Guinaudeau
Guillaume Gravier
Pascale Sébillot

Abstract

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents as well as its genericity over different programs.
Fichier principal
Vignette du fichier
guinaudeau.pdf (212.88 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00645705 , version 1 (30-11-2011)

Identifiers

  • HAL Id : hal-00645705 , version 1

Cite

Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot. Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation. Computer Speech and Language, 2012, 26 (2), pp.90-104. ⟨hal-00645705⟩
373 View
539 Download

Share

Gmail Facebook X LinkedIn More