À l'écoute des locuteurs : production participative de ressources langagières pour des langues non standardisées

Abstract : Citizen science, in particular voluntary crowdsourcing, is still little experimented solution to produce language resources for less-resourced languages with enough connected speakers. We present here experiments we led on part-of-speech annotation for non standardized languages, namely Alsatian and Guadeloupean Creole. We detail the methodology we used and show that it is adaptable to other languages, then we present the results we obtained. An analysis of the limits of this platform led us to develop a new one, that allows the creation of raw corpora and part-of-speech annotations, and the construction of a multivariant lexicon. The created platforms, language resources and tagging models are all freely available.
Document type :
Journal articles
Complete list of metadatas

Cited literature [60 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01995758
Contributor : Karën Fort <>
Submitted on : Wednesday, February 13, 2019 - 5:26:34 PM
Last modification on : Friday, September 6, 2019 - 11:48:06 AM
Long-term archiving on: Tuesday, May 14, 2019 - 12:57:23 PM

File

revue_TAL_LPD_Millour_Fort.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01995758, version 1

Citation

Alice Millour, Karën Fort. À l'écoute des locuteurs : production participative de ressources langagières pour des langues non standardisées. Traitement Automatique des Langues, ATALA, 2018. ⟨hal-01995758⟩

Share

Metrics

Record views

83

Files downloads

74