Sentence Alignment for Literary Texts
Résumé
Literary works are becoming increasingly available in electronic for- mats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle or Kobo Touch , fosters the development of a new generation of reading tools for biingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies.
An important issue in the automatic processing of multilingual e- books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation profes- sionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.
Origine : Fichiers éditeurs autorisés sur une archive ouverte