TransLiTex: A Parallel Corpus of Translated Literary Texts

Abstract : In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.
Type de document :
Communication dans un congrès
Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01827884
Contributeur : Quoc-Tan Tran <>
Soumis le : lundi 2 juillet 2018 - 18:21:51
Dernière modification le : jeudi 29 novembre 2018 - 01:11:42
Document(s) archivé(s) le : lundi 1 octobre 2018 - 08:21:42

Fichier

11_W34.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01827884, version 1

Citation

Amel Fraisse, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fishkin. TransLiTex: A Parallel Corpus of Translated Literary Texts. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. 〈hal-01827884〉

Partager

Métriques

Consultations de la notice

69

Téléchargements de fichiers

29