Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Data and Knowledge Engineering Année : 2015

Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems

Résumé

This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French–English (fr–en) and English–Spanish (en–es). After the experiments with all features, we deploy a " Feature Selection " strategy to filter the best performing ones. Then, a method that combines multiple " weak " classifiers to constitute a strong " composite " classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr–en and en–es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.
Fichier principal
Vignette du fichier
DKE_final.pdf (456.97 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01147902 , version 1 (04-05-2015)

Identifiants

Citer

Ngoc-Quang Luong, Laurent Besacier, Benjamin Lecouteux. Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems. Data and Knowledge Engineering, 2015, pp.11. ⟨10.1016/j.datak.2015.04.003⟩. ⟨hal-01147902⟩
188 Consultations
265 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More