Skip to Main content Skip to Navigation
Conference papers

Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing

Abstract : We present here the results of an experiment aiming at crowdsourcing part-of-speech annotations for a less-resourced French regional language, Alsatian. We used for this purpose a specifically-developed slightly gamified platform, Bisame. It allowed us to gather annotations on a variety of corpora covering some of the language dialectal variations. The quality of the annotations, which reach an averaged F-measure of 93%, enabled us to train a first tagger for Alsatian that is nearly 84% accurate. The platform as well as the produced annotations and tagger are all freely available. The platform can easily be adapted to other languages, thus providing a solution to (some of) the less-resourced languages issue.
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01790615
Contributor : Karën Fort <>
Submitted on : Wednesday, May 23, 2018 - 3:16:54 PM
Last modification on : Monday, March 2, 2020 - 6:24:48 PM
Document(s) archivé(s) le : Friday, August 24, 2018 - 4:26:03 PM

Identifiers

  • HAL Id : hal-01790615, version 1

Citation

Alice Millour, Karën Fort. Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. ⟨hal-01790615⟩

Share

Metrics

Record views

126

Files downloads

117