Skip to Main content Skip to Navigation
Conference papers

Construction d'un corpus parallèle français-comorien en utilisant de la TA français-swahili

Abstract : Building a French-Comorian parallel corpus using French-Swahili MT Comorian or shikomori is a macro-language made of 4 dialects very near one to another (ngazidja, maore, mweli, ndzuani), and quite near to swahili. It is quite under-resourced as far as computerized linguistic resources are concerned, having neither corpora nor dictionaries nor correction or machine translation (MT) tools. It is hence a priori not possible to build efficiently a parallel corpus, as we know how to build one using MT followed by online post-editing (PE): for French-Chinese, 17 mn/page with Google Translate (GT), 12 mn/page with the MT system and SECTra/ iMAG. We are however on the way to achieve it by post-editing swahili "pre-translations" produced by GT. Swahili is used here not as a pivot language, but as an auxiliary target language. We have now a good quality French-Ngazidja corpus containing 14 articles of the Alwatwan newspaper (899 segments, 16224 words, 65 standard pages). We extract in parallel bilingual lexical correspondences. The first application will be the active reading of French for Comorian speakers; it will use the dictionary and the MT system respectively derived from the lexical database and the growing bilingual corpus.
Document type :
Conference papers
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download
Contributor : Valérie Bellynck Connect in order to contact the contributor
Submitted on : Thursday, January 24, 2019 - 4:13:37 PM
Last modification on : Wednesday, November 3, 2021 - 6:45:47 AM


Files produced by the author(s)


  • HAL Id : hal-01992871, version 1



Moneim Abdourahamane, Christian Boitet, Valérie Bellynck, Lingxiao Wang, Hervé Blanchon. Construction d'un corpus parallèle français-comorien en utilisant de la TA français-swahili. TALAf (Traitement Automatique des Langues africaines), Jul 2016, Paris, France. ⟨hal-01992871⟩



Record views


Files downloads