Preparation and exploitation of bilingual texts - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Lux Coreana Année : 2006

Preparation and exploitation of bilingual texts

Résumé

A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for translation, language teaching and the investigation of literary text. This is all the more true for a pair of languages such as Korean and French, for which few people are bilingual, and many literary translations involve pairs of translators. For such language pairs, retrieving solutions of previously resolved translation problems is an invaluable aid. In addition, multilingual corpora are in the core of some research in natural language processing (NLP), both in theoretical fields, such as contrastive linguistic and lexicography, and in applicative fields, such as translation, term extraction, or translation memories production. The current methods of construction and exploitation of multilingual aligned corpora are essentially based on statistical models of text. In this article, we propose an enhancement of these methods with the use of lexical and grammatical resources. The open-source Unitex system is the main corpus processor that systematically makes use of lexicons and grammars for text exploration. This system can process one language at a time. We outline a project of extension of Unitex to the processing of bitexts.
Fichier principal
Vignette du fichier
VKL.pdf (516.12 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00190958 , version 1 (23-11-2007)
hal-00190958 , version 2 (27-11-2007)

Identifiants

  • HAL Id : hal-00190958 , version 2

Citer

Dusko Vitas, Cvetana Krstev, Eric Laporte. Preparation and exploitation of bilingual texts. Lux Coreana, 2006, 1, pp.110-132. ⟨hal-00190958v2⟩
216 Consultations
442 Téléchargements

Partager

Gmail Facebook X LinkedIn More