Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing

Abstract : We present here the results of an experiment aiming at crowdsourcing part-of-speech annotations for a less-resourced French regional language, Alsatian. We used for this purpose a specifically-developed slightly gamified platform, Bisame. It allowed us to gather annotations on a variety of corpora covering some of the language dialectal variations. The quality of the annotations, which reach an averaged F-measure of 93%, enabled us to train a first tagger for Alsatian that is nearly 84% accurate. The platform as well as the produced annotations and tagger are all freely available. The platform can easily be adapted to other languages, thus providing a solution to (some of) the less-resourced languages issue.
Type de document :
Communication dans un congrès
Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01790615
Contributeur : Karën Fort <>
Soumis le : mercredi 23 mai 2018 - 15:16:54
Dernière modification le : vendredi 16 novembre 2018 - 02:07:14
Document(s) archivé(s) le : vendredi 24 août 2018 - 16:26:03

Identifiants

  • HAL Id : hal-01790615, version 1

Collections

Citation

Alice Millour, Karën Fort. Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. 〈hal-01790615〉

Partager

Métriques

Consultations de la notice

28

Téléchargements de fichiers

18