Skip to Main content Skip to Navigation
Conference papers

Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing

Abstract : We present here the results of an experiment aiming at crowdsourcing part-of-speech annotations for a less-resourced French regional language, Alsatian. We used for this purpose a specifically-developed slightly gamified platform, Bisame. It allowed us to gather annotations on a variety of corpora covering some of the language dialectal variations. The quality of the annotations, which reach an averaged F-measure of 93%, enabled us to train a first tagger for Alsatian that is nearly 84% accurate. The platform as well as the produced annotations and tagger are all freely available. The platform can easily be adapted to other languages, thus providing a solution to (some of) the less-resourced languages issue.
Document type :
Conference papers
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01790615
Contributor : Karën Fort Connect in order to contact the contributor
Submitted on : Wednesday, May 23, 2018 - 3:16:54 PM
Last modification on : Wednesday, December 9, 2020 - 3:09:41 PM
Long-term archiving on: : Friday, August 24, 2018 - 4:26:03 PM

Identifiers

  • HAL Id : hal-01790615, version 1

Citation

Alice Millour, Karën Fort. Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. ⟨hal-01790615⟩

Share

Metrics

Record views

168

Files downloads

195