Skip to Main content Skip to Navigation
New interface
Conference papers

TransLiTex: A Parallel Corpus of Translated Literary Texts

Abstract : In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Quoc-Tan Tran Connect in order to contact the contributor
Submitted on : Monday, July 2, 2018 - 6:21:51 PM
Last modification on : Tuesday, November 22, 2022 - 2:26:15 PM
Long-term archiving on: : Monday, October 1, 2018 - 8:21:42 AM


Files produced by the author(s)


  • HAL Id : hal-01827884, version 1


Amel Fraisse, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fisher Fishkin. TransLiTex: A Parallel Corpus of Translated Literary Texts. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Beijing Advanced Innovation Center for Language Resources, May 2018, Miyazaki, Japan. ⟨hal-01827884⟩



Record views


Files downloads