TransLiTex: A Parallel Corpus of Translated Literary Texts

Abstract : In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01827884
Contributor : Quoc-Tan Tran <>
Submitted on : Monday, July 2, 2018 - 6:21:51 PM
Last modification on : Saturday, March 16, 2019 - 1:55:44 AM
Document(s) archivé(s) le : Monday, October 1, 2018 - 8:21:42 AM

File

11_W34.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01827884, version 1

Citation

Amel Fraisse, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fishkin. TransLiTex: A Parallel Corpus of Translated Literary Texts. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. 〈hal-01827884〉

Share

Metrics

Record views

108

Files downloads

64