Unsupervised Word Segmentation: does tone matter ? - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Unsupervised Word Segmentation: does tone matter ?

Résumé

In this paper, we investigate the usefulness of tonal features for unsupervised word discovery, taking Mboshi, a low-resource tonal language from the Bantu family, as our main target language. In a preliminary step, we show that tone annotation improves the performance of \emph{supervised learning} when using a simplified representation of the data. To leverage this information in an unsupervised setting, we then present a probabilistic model based on a hierarchical Pitman-Yor process that incorporates tonal representations in its backoff structure. We compare our model with a tone-agnostic baseline and analyze if and how tone helps unsupervised segmentation on our small dataset.
Fichier principal
Vignette du fichier
Godard18doestone.pdf (273.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01910756 , version 1 (08-10-2021)

Identifiants

  • HAL Id : hal-01910756 , version 1

Citer

Pierre Godard, Kevin Löser, Alexandre Allauzen, Laurent Besacier, François Yvon. Unsupervised Word Segmentation: does tone matter ?. International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam. ⟨hal-01910756⟩
135 Consultations
21 Téléchargements

Partager

Gmail Facebook X LinkedIn More