Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems

Abstract : This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French–English (fr–en) and English–Spanish (en–es). After the experiments with all features, we deploy a " Feature Selection " strategy to filter the best performing ones. Then, a method that combines multiple " weak " classifiers to constitute a strong " composite " classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr–en and en–es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.
Document type :
Journal articles
Liste complète des métadonnées

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01147902
Contributor : Laurent Besacier <>
Submitted on : Monday, May 4, 2015 - 2:23:14 PM
Last modification on : Thursday, April 4, 2019 - 10:18:05 AM
Document(s) archivé(s) le : Wednesday, April 19, 2017 - 12:34:34 PM

File

DKE_final.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Ngoc-Quang Luong, Laurent Besacier, Benjamin Lecouteux. Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems. Data and Knowledge Engineering, Elsevier, 2015, pp.11. ⟨10.1016/j.datak.2015.04.003⟩. ⟨hal-01147902⟩

Share

Metrics

Record views

407

Files downloads

239