The naturalness challenge: a corpus-based analysis of EN-FR machine-translated language

Abstract : We have all heard or read the quotation “Machine translation will only displace those who translate like machines”, by Arle Richard Lommel. And indeed, one of the challenges for translation trainers today is to teach students how to demonstrate their added value over machine translation systems. Because human languages are complex, it is often said that if there is one thing (bio)translators can do while MT systems cannot is provide translations that go beyond lexical/grammatical correctness, that is to say that take into account language use in addition to morpho-syntactic rules, and therefore the norms of the target language. The final aim is the translator’s invisibility (Venuti 1995) thanks to the naturalness (Salkie 2007) of the target texts. The use of comparable electronic corpora can help uncover such lexical/grammatical usage, showing for instance that so-called translational equivalents are not used with the same frequencies in the source language and in the target language. For example, although the English and French lemmas thing and chose, or the coordinators and and et, are considered to be translational equivalents for these two languages, they show significant differences in their frequencies of use (Loock 2016). This is also true for morpho-syntactic constructions, e.g. the use of existential structures (Cappelle & Loock 2013) or derived adverbs (Loock et al. 2014). What we would like to investigate in this presentation is to what extent machine translation, in particular neural machine translation which has become known for its quite surprising target language fluidity, takes into account lexical and grammatical usage. Thanks to a corpus of machine-translated texts from English into French, and through the comparison with a comparable corpus of English and French original texts, we provide results for a series of lexical and grammatical features in order to check whether machine translation is up to the grammatical naturalness challenge. Our comparisons will thus be drawn between 4 corpora, as 2 MT systems, one neural and one statistical, will be put to the test: Corpus 1: A corpus of original English Corpus 2: A corpus of original French, comparable to corpus 1 in terms of content Corpus 3: A corpus of machine-translated French from English (corpus 1) with DeepL Translator (https://www.deepl.com/translator) [NEURAL MT] Corpus 4: A corpus of machine-translated French from English (corpus 1) with DGT’s eTranslation/MT@EC service [STATISTICAL MT]
Type de document :
Communication dans un congrès
Translating Europe workshop "How to harness translation tools to maximise efficiency and accuracy", May 2018, Rome, Italy. 〈https://www.unint.eu/it/calendario-eventi/tew-how-to-harness-translation-tools-to-maximise-efficiency-and-accuracy.html〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01771892
Contributeur : Rudy Loock <>
Soumis le : vendredi 20 avril 2018 - 08:12:07
Dernière modification le : mardi 3 juillet 2018 - 11:35:47

Identifiants

  • HAL Id : hal-01771892, version 1

Collections

Citation

Rudy Loock. The naturalness challenge: a corpus-based analysis of EN-FR machine-translated language. Translating Europe workshop "How to harness translation tools to maximise efficiency and accuracy", May 2018, Rome, Italy. 〈https://www.unint.eu/it/calendario-eventi/tew-how-to-harness-translation-tools-to-maximise-efficiency-and-accuracy.html〉. 〈hal-01771892〉

Partager

Métriques

Consultations de la notice

161