Sentence Alignment for Literary Texts - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Linguistic Issues in Language Technology Année : 2015

Sentence Alignment for Literary Texts

Résumé

Literary works are becoming increasingly available in electronic for- mats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle or Kobo Touch , fosters the development of a new generation of reading tools for biingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e- books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation profes- sionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.
Fichier principal
Vignette du fichier
2015.lilt-12.6.pdf (347.71 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-01634995 , version 1 (04-03-2021)

Identifiants

  • HAL Id : hal-01634995 , version 1

Citer

Yong Xu, Aurélien Max, François Yvon. Sentence Alignment for Literary Texts. Linguistic Issues in Language Technology, 2015, 12, pp.1-25. ⟨hal-01634995⟩
129 Consultations
214 Téléchargements

Partager

Gmail Facebook X LinkedIn More