A 2D CRF Model for Sentence Alignment - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

A 2D CRF Model for Sentence Alignment

Résumé

The identification of parallel segments in parallel or comparable corpora can be performed at various levels. Alignments at the sentence level are useful for many downstream tasks, and also simplify the identification of finer grain correspondences. Most state-of-the-art sentence aligners are unsupervised, and attempt to infer endogenous alignment clues based on the analysis of the sole bitext. The computation of alignments typically relies on multiple simplifying assumptions, so that efficient dynamic programming techniques can be used. Because of these assumptions, high-precision sentence alignment remains difficult for certain types of corpora, in particular for literary texts. In this paper, we propose to learn a supervised alignment model, which represents the alignment matrix as two-dimensional Conditional Random Fields (2D CRF), converting sentence alignment into a structured prediction problem. This formalism enables us to take advantage of a rich set of overlapping features. Furthermore, it also allows us to relax some assumptions in decoding.
Fichier principal
Vignette du fichier
Xu16a2d.pdf (1.14 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01388656 , version 1 (02-10-2017)

Identifiants

  • HAL Id : hal-01388656 , version 1

Citer

Yong Xu, François Yvon. A 2D CRF Model for Sentence Alignment. 9th Workshop on Building and Using Comparable Corpora, 2016, Portorož, Slovenia. ⟨hal-01388656⟩
310 Consultations
151 Téléchargements

Partager

Gmail Facebook X LinkedIn More