Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine­-Translated French - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Translation Quarterly Année : 2021

Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine­-Translated French

Orphée de Clercq
  • Fonction : Auteur
Gert de Sutter
  • Fonction : Auteur
Rudy Loock
Bert Cappelle
Koen Plevoets
  • Fonction : Auteur

Résumé

This paper investigates the linguistic characteristics of English to French machine­-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus­-based translation studies which have focused on human­-translated texts, and using a corpus­-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8­-million­-word corpus of English to French translations of press texts, corresponding to the output of four machine translation sy­stems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Trans­late, and the European Commission’s eTranslation MT tool, in both its SMT and NMT ver­sions. In particular, to complement a previous study on language­-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language­-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed­ class lexical categories and measures of lexical diversity. Our results, which compare the machine­-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine­-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, n­gram features, lexicaldiversity), and which might serve as information for the post­-diting process in order to optimize translation quality.
Fichier principal
Vignette du fichier
Uncovering Machine Translationese Using Corpus Analysis Techniques.pdf (862.87 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03406287 , version 1 (13-02-2023)

Identifiants

  • HAL Id : hal-03406287 , version 1

Citer

Orphée de Clercq, Gert de Sutter, Rudy Loock, Bert Cappelle, Koen Plevoets. Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine­-Translated French. Translation Quarterly, 2021, 101, pp.21-45. ⟨hal-03406287⟩
185 Consultations
130 Téléchargements

Partager

Gmail Facebook X LinkedIn More