An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

Abstract : Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predicting the quality of words of a SMT output, whose novel contributions are (i) support for various target languages, (ii) handle a number of features of different types (system-based, lexical , syntactic and semantic). In addition, the toolkit also integrates a wide variety of Natural Language Processing or Machine Learning tools to pre-process data, extract features and estimate confidence at word-level. Features for Word-level Confidence Estimation (WCE) can be easily added / removed using a configuration file. We validate the toolkit by experimenting in the WCE evaluation framework of WMT shared task with two language pairs: French-English and English-Spanish. The toolkit is made available to the research community with ready-made scripts to launch full experiments on these language pairs, while achieving state-of-the-art and reproducible performances.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [33 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01244477
Contributor : Christophe Servan <>
Submitted on : Tuesday, December 15, 2015 - 6:46:50 PM
Last modification on : Monday, February 11, 2019 - 4:36:02 PM
Document(s) archivé(s) le : Wednesday, March 16, 2016 - 4:20:58 PM

File

WCE_iwslt15.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01244477, version 1

Collections

Citation

Christophe Servan, Ngoc-Tien Le, Ngoc Quang Luong, Benjamin Lecouteux, Laurent Besacier. An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation. The 12th International Workshop on Spoken Language Translation (IWSLT'15), Dec 2015, Da Nang, Vietnam. ⟨hal-01244477⟩

Share

Metrics

Record views

467

Files downloads

198