Incorporating Prosodic Boundaries in Unsupervised Term Discovery

Bogdan Ludusan 1 Guillaume Gravier 2 Emmanuel Dupoux 1
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We present a preliminary investigation on the usefulness of prosodic boundaries for unsupervised term discovery (UTD). Studies in language acquisition show that infants use prosodic boundaries to segment continuous speech into word-like units. We evaluate whether such a strategy could also help UTD algo- rithms. Running a previously published UTD algorithm (MODIS) on a corpus of prosodically annotated English broadcast news revealed that many discovered terms straddle prosodic boundaries. We then implemented two variants of this algorithm: one that discards straddling items and one that truncates them to the nearest boundary (either prosodic or pause marker). Both algorithms showed a better term matching F-score compared to the baseline and higher level prosodic boundaries were found to be better than lower level boundaries or pause markers. In addition, we observed that the truncation algorithm, but not the discard algorithm, increased word boundary F-score over the baseline.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01026421
Contributor : Guillaume Gravier <>
Submitted on : Monday, July 21, 2014 - 3:46:15 PM
Last modification on : Thursday, December 6, 2018 - 1:53:32 AM

Identifiers

  • HAL Id : hal-01026421, version 1

Citation

Bogdan Ludusan, Guillaume Gravier, Emmanuel Dupoux. Incorporating Prosodic Boundaries in Unsupervised Term Discovery. International conference on Speech Prosody, May 2014, Dublin, Ireland. pp.207-211. ⟨hal-01026421⟩

Share

Metrics

Record views

1118