Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems

Abstract : This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French–English (fr–en) and English–Spanish (en–es). After the experiments with all features, we deploy a " Feature Selection " strategy to filter the best performing ones. Then, a method that combines multiple " weak " classifiers to constitute a strong " composite " classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr–en and en–es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.
Type de document :
Article dans une revue
Data and Knowledge Engineering, Elsevier, 2015, pp.11. 〈10.1016/j.datak.2015.04.003〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01147902
Contributeur : Laurent Besacier <>
Soumis le : lundi 4 mai 2015 - 14:23:14
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : mercredi 19 avril 2017 - 12:34:34

Fichier

DKE_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Ngoc-Quang Luong, Laurent Besacier, Benjamin Lecouteux. Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems. Data and Knowledge Engineering, Elsevier, 2015, pp.11. 〈10.1016/j.datak.2015.04.003〉. 〈hal-01147902〉

Partager

Métriques

Consultations de la notice

391

Téléchargements de fichiers

226