HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Topic segmentation of TV-streams by watershed transform and vectorization

Vincent Claveau 1 Sébastien Lefèvre 2
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
2 OBELIX - Environment observation with complex imagery
IRISA-D5 - SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE, UBS - Université de Bretagne Sud
Abstract : A fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processing tasks. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we present a new segmentation technique inspired from the image analysis field and relying on a new way to compute similarities between candidate segments called Vectorization. Vectorization makes it possible to match text segments that do not share common words; this property is shown to be particularly useful when dealing with transcripts in which transcription errors and short segments makes the segmentation difficult. This new topic segmen-tation technique is evaluated on two corpora of transcripts from French TV broadcasts on which it largely outperforms other existing approaches from the state-of-the-art.
Document type :
Journal articles
Complete list of metadata

Cited literature [48 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00998259
Contributor : Sébastien Lefèvre Connect in order to contact the contributor
Submitted on : Wednesday, November 13, 2019 - 5:42:29 PM
Last modification on : Tuesday, October 19, 2021 - 11:59:00 PM

File

csl2015.pdf
Files produced by the author(s)

Identifiers

Citation

Vincent Claveau, Sébastien Lefèvre. Topic segmentation of TV-streams by watershed transform and vectorization. Computer Speech and Language, Elsevier, 2015, 29 (1), pp.63-80. ⟨10.1016/j.csl.2014.04.006⟩. ⟨hal-00998259⟩

Share

Metrics

Record views

216

Files downloads

68