Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Vers une solution légère de production de données pour le TAL : création d'un tagger de l'alsacien par crowdsourcing bénévole

Abstract : We present here the results of an experiment on part-of-speech annotation of a corpus in a low-resourced regional language, Alsatian, using a specifically-developed voluntary crowdsourcing platform: Bisame. 1 It has been online since May 2016 and has allowed to gather 15,846 annotations, thanks to 42 participants. An evaluation performed on a reference corpus shows a F-measure of 0.93 of the produced annotations. The tagger trained on these annotations is accurate in 82% of the cases. This is the first POS tagger developed for Alsatian. This language resources development method proved to be efficient and promising for some low-resourced languages, for which a significant number of speakers have access to the Internet. The platform code, the annotated corpus and the tagger are all freely available.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01516226
Contributor : Karën Fort Connect in order to contact the contributor
Submitted on : Thursday, October 5, 2017 - 11:42:51 AM
Last modification on : Friday, June 17, 2022 - 3:46:12 AM

File

taln2017_alsacien.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01516226, version 2

Citation

Alice Millour, Karën Fort, Delphine Bernhard, Lucie Steiblé. Vers une solution légère de production de données pour le TAL : création d'un tagger de l'alsacien par crowdsourcing bénévole. Traitement Automatique des Langues Naturelles (TALN), Jun 2017, Orléans, France. ⟨hal-01516226v2⟩

Share

Metrics

Record views

213

Files downloads

208