Skip to Main content Skip to Navigation
Conference papers

Vers une solution légère de production de données pour le TAL : création d'un tagger de l'alsacien par crowdsourcing bénévole

Abstract : We present here the results of an experiment on part-of-speech annotation of a corpus in a low-resourced regional language, Alsatian, using a specifically-developed voluntary crowdsourcing platform: Bisame. 1 It has been online since May 2016 and has allowed to gather 15,846 annotations, thanks to 42 participants. An evaluation performed on a reference corpus shows a F-measure of 0.93 of the produced annotations. The tagger trained on these annotations is accurate in 82% of the cases. This is the first POS tagger developed for Alsatian. This language resources development method proved to be efficient and promising for some low-resourced languages, for which a significant number of speakers have access to the Internet. The platform code, the annotated corpus and the tagger are all freely available.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01516226
Contributor : Karën Fort <>
Submitted on : Thursday, October 5, 2017 - 11:42:51 AM
Last modification on : Monday, December 14, 2020 - 9:53:49 AM

File

taln2017_alsacien.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01516226, version 2

Citation

Alice Millour, Karën Fort, Delphine Bernhard, Lucie Steiblé. Vers une solution légère de production de données pour le TAL : création d'un tagger de l'alsacien par crowdsourcing bénévole. Traitement Automatique des Langues Naturelles (TALN), Jun 2017, Orléans, France. ⟨hal-01516226v2⟩

Share

Metrics

Record views

215

Files downloads

286