Sentence Alignment for Literary Texts

Yong Xu; Aurélien Max; François Yvon

Article Dans Une Revue Linguistic Issues in Language Technology Année : 2015

Sentence Alignment for Literary Texts

(1) , (1) , (1)

Yong Xu

Fonction : Auteur
PersonId : 1022781

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Aurélien Max

Fonction : Auteur
PersonId : 1022782

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

François Yvon

Fonction : Auteur
PersonId : 5347
IdHAL : francois-yvon
ORCID : 0000-0002-7972-7442
IdRef : 057593531

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

Literary works are becoming increasingly available in electronic for- mats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle or Kobo Touch , fosters the development of a new generation of reading tools for biingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e- books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation profes- sionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.

Mots clés

Alignement automatique bitextes traduction automatique

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Fichier principal

2015.lilt-12.6.pdf (347.71 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01634995

Soumis le : jeudi 4 mars 2021-16:55:28

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : samedi 5 juin 2021-19:15:29

Dates et versions

hal-01634995 , version 1 (04-03-2021)

Identifiants

HAL Id : hal-01634995 , version 1

Citer

Yong Xu, Aurélien Max, François Yvon. Sentence Alignment for Literary Texts. Linguistic Issues in Language Technology, 2015, 12, pp.1-25. ⟨hal-01634995⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-COMPUTER-SCIENCE

129 Consultations

214 Téléchargements

Sentence Alignment for Literary Texts

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager