Skip to Main content Skip to Navigation
Conference papers

Détection de zones parallèles à l’intérieur de multi-documents pour l’alignement multilingue

Charlotte Lecluze 1 Romain Brixtel 1 Loïs Rigouste 2 Emmanuel Giguet 1 Régis Clouard 3 Gaël Lejeune 1 Patrick Constant 4
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
3 Equipe Image - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : This article broaches a central issue of the automatic alignment : diagnosing the parallelism of documents. Previous research was concentrated on the analysis of documents which are parallel by nature such as corpus of regulations, technical documents or simple sentences. Inversions and deletions/additions phenomena that may exist between different versions of a document has often been overlooked. To the contrary, we propose a method to diagnose in context the parallel areas allowing the detection of deletions or inversions between documents to align. This original method is based on the freeing from word and sentence as well as the consideration of the text formatting. The implementation is based on the detection of repeated character strings and the identification of parallel segments by image processing.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01074950
Contributor : Greyc Référent Connect in order to contact the contributor
Submitted on : Thursday, October 16, 2014 - 10:39:53 AM
Last modification on : Tuesday, October 19, 2021 - 11:34:59 PM
Long-term archiving on: : Saturday, January 17, 2015 - 10:21:20 AM

File

ACTN-LECLUZE-2013-1.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01074950, version 1

Citation

Charlotte Lecluze, Romain Brixtel, Loïs Rigouste, Emmanuel Giguet, Régis Clouard, et al.. Détection de zones parallèles à l’intérieur de multi-documents pour l’alignement multilingue. 20ème conférence du Traitement Automatique du Langage Naturel 2013 (TALN 2013), Jun 2013, Sables d'Olonne, France. ⟨hal-01074950⟩

Share

Metrics

Les métriques sont temporairement indisponibles