Find The Errors, Get The Better: Enhancing Machine Translation via Word Confidence Estimation

Abstract : This article presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by Word Con dence Estimation (WCE) systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the rst attempt, the selection scope is limited to the MT N-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the rst pass of decoding. Over all paths containing words of the N-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest-score candidate after the search becomes the ocial translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the Search Graph Re-decoding achieves more gains (in BLEU score) than N-best List Re-ranking method.
Type de document :
Article dans une revue
Natural Language Engineering, Cambridge University Press (CUP), 2017, 1, pp.1 - 24
Liste complète des métadonnées

Littérature citée [51 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01436779
Contributeur : <>
Soumis le : jeudi 9 novembre 2017 - 15:04:32
Dernière modification le : vendredi 10 novembre 2017 - 11:31:39

Fichier

papier_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01436779, version 1
  • Mot de passe :

Collections

Citation

Ngoc-Quang Luong, Laurent Besacier, Benjamin Lecouteux. Find The Errors, Get The Better: Enhancing Machine Translation via Word Confidence Estimation. Natural Language Engineering, Cambridge University Press (CUP), 2017, 1, pp.1 - 24. 〈hal-01436779〉

Partager

Métriques

Consultations de la notice

164

Téléchargements de fichiers

5