Investigating the usability of automatic metrics for characterizing translated vs post-edited texts

Hanna Martikainen

Résumé

Despite the democratization of machine translation (MT) technologies and their pervasiveness today in professional translation workflows, more notably so since the advent of neural network engines, translation students – and teachers – still mostly liken MT technologies to cheating. Thus, learners’ use of MT is often unguided and ill-advised, as they tend to resort to MT without discernment, unaware of the limits of the technologies and the risks inherent in their use. This results in misguided and somewhat random patterns of over- and under-confidence in MT suggestions, that can be particularly damaging in specialized translation. In order to foster professional attitudes towards MT technologies among translation students and for learners to acquire solid best practices in MT use, it is important to make them aware of the limits of the technologies and the necessity of human intervention. One way of demonstrating the added value of human intervention is by exploring the differences between the texts resulting from these two processes. Much attention has been dedicated to comparative error analysis, which has shown that MT involvement in the translation process mostly tends to result in better end-product quality – as measured in terms of errors – than human translation without technological aid. Notwithstanding, differences between the texts resulting from these two processes go beyond errors and can be characterized, for instance, in terms of lexical and syntactic variety, syntactic reorganization, creativity and adaptation, explicitation, etc. This paper investigates the usability of automatic metrics, such as edit distance or type-token ratio (TTR), for characterizing post-edited texts in comparison with human translation. Using a corpus of post-edited and translated texts produced by Master’s students in translation, the author seeks to determine the specificities of each process and to define what characterizes human translation in comparison with post-editing.

Les technologies de traduction automatique (TA) sont désormais fermement implantées dans les environnements de traduction professionnelle et largement adoptées par le grand public, notamment depuis l’avènement des moteurs neuronaux. Nonobstant, elles sont encore souvent assimilées à de la « triche » par les apprenants en traduction ainsi que les enseignants, ce qui mène à une utilisation non avertie de ces solutions, sans connaissance de leurs limites et des risques inhérents. Ainsi, les apprenants alternent souvent aléatoirement entre sur- et sous-confiance vis-à-vis de la production des moteurs. Ce manque de discernement en post-édition peut se révéler particulièrement dangereux en traduction spécialisée. La sensibilisation des apprenants aux limites de ces technologies et à la nécessité d’une intervention humaine est nécessaire pour encourager une attitude professionnelle vis-à-vis de l’outil technologique et pour permettre aux apprenants d’acquérir des bonnes pratiques éprouvées pour intégrer la TA à leur palette d’outils. La valeur ajoutée de l’humain peut notamment être mise en avant en faisant la démonstration des différences observables dans les textes résultant de ces deux processus. L’analyse comparative des erreurs a fait l’objet de recherches conséquentes, dans lesquelles on a pu montrer que l’intégration de la TA dans le flux de traduction permet généralement d’améliorer la qualité du produit final, telle que mesurée par le nombre d’erreurs, comparativement à la traduction sans outil technologique. Cependant, les différences entre textes traduits et post-édités vont bien au-delà des erreurs et se manifestent, par exemple, sur les plans de la variété lexicale et syntaxique, le degré de réorganisation syntaxique, la créativité et l’adaptation au public ou encore, le degré d’explicitation. Dans cette présentation, nous enquêtons sur l’intérêt des mesures automatiques comme la distance d’édition ou le type/token ratio (TTR) pour la caractérisation des textes post-édités comparativement aux textes traduits. A partir d’un corpus de textes produits par des apprenants en Master de traduction, nous cherchons à déterminer les spécificités de la traduction et de la post-édition.

Investigating the usability of automatic metrics for characterizing translated vs post-edited texts

Intérêt des mesures automatiques dans la caractérisation des textes traduits et post-édités

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager