Preparation and exploitation of bilingual texts - Archive ouverte HAL Access content directly
Journal Articles Lux Coreana Year : 2006

Preparation and exploitation of bilingual texts

Abstract

A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for translation, language teaching and the investigation of literary text. This is all the more true for a pair of languages such as Korean and French, for which few people are bilingual, and many literary translations involve pairs of translators. For such language pairs, retrieving solutions of previously resolved translation problems is an invaluable aid. In addition, multilingual corpora are in the core of some research in natural language processing (NLP), both in theoretical fields, such as contrastive linguistic and lexicography, and in applicative fields, such as translation, term extraction, or translation memories production. The current methods of construction and exploitation of multilingual aligned corpora are essentially based on statistical models of text. In this article, we propose an enhancement of these methods with the use of lexical and grammatical resources. The open-source Unitex system is the main corpus processor that systematically makes use of lexicons and grammars for text exploration. This system can process one language at a time. We outline a project of extension of Unitex to the processing of bitexts.
Fichier principal
Vignette du fichier
KLV.pdf (327.15 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-00190958 , version 1 (23-11-2007)
hal-00190958 , version 2 (27-11-2007)

Identifiers

  • HAL Id : hal-00190958 , version 1

Cite

Dusko Vitas, Cvetana Krstev, Eric Laporte. Preparation and exploitation of bilingual texts. Lux Coreana, 2006, 1, pp.110-132. ⟨hal-00190958v1⟩

Collections

ENPC UNIV-MLV LIGM
218 View
448 Download

Share

Gmail Facebook X LinkedIn More