"This sentence is wrong." Detecting errors in machine-translated sentences. - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Machine Translation Année : 2011

"This sentence is wrong." Detecting errors in machine-translated sentences.

Résumé

Machine translation systems are not reliable enough to be used ''as is'': except for the most simple tasks, they can only be used to grasp the general meaning of a text or assist human translators. The purpose of confidence measures is to detect erroneous words or sentences produced by a machine translation system. In this article after reviewing the mathematical foundations of confidence estimation we propose a comparison of several state-of-the-art confidence measures, predictive parameters and classifiers. We also propose two original confidence measures based on Mutual Information and a method for automatically generating data for training and testing classifiers. We applied these techniques to data from WMT campaign 2008 and found that the best confidence measures yielded an Equal Error Rate of 36.3% at word level and 34.2% at sentence level, but combining different measures reduced these rates to respectively 35.0\% and 29.0\%. We also present the results of an experiment aimed at determining how helpful confidence measures are in a post edition task. Preliminary results suggest that our system is not yet ready to efficiently help post editors, but we now have a software and protocol we can apply to further experiments, and user feedback has indicated aspects which must be improved in order to increase the level of helpfulness of confidence measures.
Fichier principal
Vignette du fichier
cm-springer-utf8.pdf (345.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00606350 , version 1 (06-07-2012)

Identifiants

Citer

Sylvain Raybaud, David Langlois, Kamel Smaïli. "This sentence is wrong." Detecting errors in machine-translated sentences.. Machine Translation, 2011, 25 (1), p. 1--34. ⟨10.1007/s10590-011-9094-9⟩. ⟨hal-00606350⟩
434 Consultations
424 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More