Unsupervised Word Segmentation: does tone matter ?

Abstract : In this paper, we investigate the usefulness of tonal features for unsupervised word discovery, taking Mboshi, a low-resource tonal language from the Bantu family, as our main target language. In a preliminary step, we show that tone annotation improves the performance of \emph{supervised learning} when using a simplified representation of the data. To leverage this information in an unsupervised setting, we then present a probabilistic model based on a hierarchical Pitman-Yor process that incorporates tonal representations in its backoff structure. We compare our model with a tone-agnostic baseline and analyze if and how tone helps unsupervised segmentation on our small dataset.
Type de document :
Communication dans un congrès
International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01910756
Contributeur : Limsi Publications <>
Soumis le : jeudi 1 novembre 2018 - 21:36:17
Dernière modification le : mardi 12 février 2019 - 01:30:10

Identifiants

  • HAL Id : hal-01910756, version 1

Citation

Pierre Godard, Kevin Löser, Alexandre Allauzen, Laurent Besacier, François Yvon. Unsupervised Word Segmentation: does tone matter ?. International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam. 〈hal-01910756〉

Partager

Métriques

Consultations de la notice

45