Skip to Main content Skip to Navigation
Conference papers

Construction d'un corpus parallèle français-comorien en utilisant de la TA français-swahili

Abstract : Building a French-Comorian parallel corpus using French-Swahili MT Comorian or shikomori is a macro-language made of 4 dialects very near one to another (ngazidja, maore, mweli, ndzuani), and quite near to swahili. It is quite under-resourced as far as computerized linguistic resources are concerned, having neither corpora nor dictionaries nor correction or machine translation (MT) tools. It is hence a priori not possible to build efficiently a parallel corpus, as we know how to build one using MT followed by online post-editing (PE): for French-Chinese, 17 mn/page with Google Translate (GT), 12 mn/page with the MosesLIG.fr-zh MT system and SECTra/ iMAG. We are however on the way to achieve it by post-editing swahili "pre-translations" produced by GT. Swahili is used here not as a pivot language, but as an auxiliary target language. We have now a good quality French-Ngazidja corpus containing 14 articles of the Alwatwan newspaper (899 segments, 16224 words, 65 standard pages). We extract in parallel bilingual lexical correspondences. The first application will be the active reading of French for Comorian speakers; it will use the dictionary and the MT system respectively derived from the lexical database and the growing bilingual corpus.
Document type :
Conference papers
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01992871
Contributor : Valérie Bellynck <>
Submitted on : Thursday, January 24, 2019 - 4:13:37 PM
Last modification on : Friday, July 17, 2020 - 11:10:27 AM

File

ABDOURAHAMANE_ET_AL - Construc...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01992871, version 1

Collections

Citation

Moneim Abdourahamane, Christian Boitet, Valérie Bellynck, Lingxiao Wang, Hervé Blanchon. Construction d'un corpus parallèle français-comorien en utilisant de la TA français-swahili. TALAf (Traitement Automatique des Langues africaines), Jul 2016, Paris, France. ⟨hal-01992871⟩

Share

Metrics

Record views

51

Files downloads

34