Unsupervised Word Segmentation: does tone matter ?

Abstract : In this paper, we investigate the usefulness of tonal features for unsupervised word discovery, taking Mboshi, a low-resource tonal language from the Bantu family, as our main target language. In a preliminary step, we show that tone annotation improves the performance of \emph{supervised learning} when using a simplified representation of the data. To leverage this information in an unsupervised setting, we then present a probabilistic model based on a hierarchical Pitman-Yor process that incorporates tonal representations in its backoff structure. We compare our model with a tone-agnostic baseline and analyze if and how tone helps unsupervised segmentation on our small dataset.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01910756
Contributor : Limsi Publications <>
Submitted on : Thursday, November 1, 2018 - 9:36:17 PM
Last modification on : Saturday, March 16, 2019 - 1:55:47 AM

Identifiers

  • HAL Id : hal-01910756, version 1

Citation

Pierre Godard, Kevin Löser, Alexandre Allauzen, Laurent Besacier, François Yvon. Unsupervised Word Segmentation: does tone matter ?. International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam. ⟨hal-01910756⟩

Share

Metrics

Record views

50