Preliminary Experiments on Unsupervised Word Discovery in Mboshi

Abstract : The necessity to document thousands of endangered languages encourages the collaboration between linguists and computer scientists in order to provide the documentary linguistics community with the support of automatic processing tools. The French-German ANR-DFG project Breaking the Unwritten Language Barrier (BULB) aims at developing such tools for three mostly unwritten African languages of the Bantu family. For one of them, Mboshi, a language originating from the " Cu-vette " region of the Republic of Congo, we investigate unsuper-vised word discovery techniques from an unsegmented stream of phonemes. We compare different models and algorithms, both monolingual and bilingual, on a new corpus in Mboshi and French, and discuss various ways to represent the data with suitable granularity. An additional French-English corpus allows us to contrast the results obtained on Mboshi and to experiment with more data.
Type de document :
Communication dans un congrès
Interspeech 2016, Sep 2016, San-Francisco, United States. Interspeech 2016 proceedings
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01350119
Contributeur : Laurent Besacier <>
Soumis le : vendredi 29 juillet 2016 - 16:33:32
Dernière modification le : mardi 20 novembre 2018 - 14:04:02
Document(s) archivé(s) le : dimanche 30 octobre 2016 - 10:46:43

Fichier

886_Paper_last.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01350119, version 1

Citation

Pierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, et al.. Preliminary Experiments on Unsupervised Word Discovery in Mboshi. Interspeech 2016, Sep 2016, San-Francisco, United States. Interspeech 2016 proceedings. 〈hal-01350119〉

Partager

Métriques

Consultations de la notice

532

Téléchargements de fichiers

268