Skip to Main content Skip to Navigation
Conference papers

Unsupervised Word Segmentation: does tone matter ?

Abstract : In this paper, we investigate the usefulness of tonal features for unsupervised word discovery, taking Mboshi, a low-resource tonal language from the Bantu family, as our main target language. In a preliminary step, we show that tone annotation improves the performance of \emph{supervised learning} when using a simplified representation of the data. To leverage this information in an unsupervised setting, we then present a probabilistic model based on a hierarchical Pitman-Yor process that incorporates tonal representations in its backoff structure. We compare our model with a tone-agnostic baseline and analyze if and how tone helps unsupervised segmentation on our small dataset.
Complete list of metadatas
Contributor : Limsi Publications <>
Submitted on : Thursday, November 1, 2018 - 9:36:17 PM
Last modification on : Wednesday, September 16, 2020 - 5:30:50 PM


  • HAL Id : hal-01910756, version 1


Pierre Godard, Kevin Löser, Alexandre Allauzen, Laurent Besacier, François Yvon. Unsupervised Word Segmentation: does tone matter ?. International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam. ⟨hal-01910756⟩



Record views