Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing

Alice Millour
Karën Fort

Résumé

We present here the results of an experiment aiming at crowdsourcing part-of-speech annotations for a less-resourced French regional language, Alsatian. We used for this purpose a specifically-developed slightly gamified platform, Bisame. It allowed us to gather annotations on a variety of corpora covering some of the language dialectal variations. The quality of the annotations, which reach an averaged F-measure of 93%, enabled us to train a first tagger for Alsatian that is nearly 84% accurate. The platform as well as the produced annotations and tagger are all freely available. The platform can easily be adapted to other languages, thus providing a solution to (some of) the less-resourced languages issue.
Fichier principal
Vignette du fichier
lrec2018_alsacien.pdf (185.41 Ko) Télécharger le fichier
Loading...

Dates et versions

hal-01790615 , version 1 (23-05-2018)

Identifiants

  • HAL Id : hal-01790615 , version 1

Citer

Alice Millour, Karën Fort. Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. ⟨hal-01790615⟩
104 Consultations
127 Téléchargements

Partager

Gmail Facebook X LinkedIn More