Skip to Main content Skip to Navigation
New interface
Conference papers

An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

Abstract : Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predicting the quality of words of a SMT output, whose novel contributions are (i) support for various target languages, (ii) handle a number of features of different types (system-based, lexical , syntactic and semantic). In addition, the toolkit also integrates a wide variety of Natural Language Processing or Machine Learning tools to pre-process data, extract features and estimate confidence at word-level. Features for Word-level Confidence Estimation (WCE) can be easily added / removed using a configuration file. We validate the toolkit by experimenting in the WCE evaluation framework of WMT shared task with two language pairs: French-English and English-Spanish. The toolkit is made available to the research community with ready-made scripts to launch full experiments on these language pairs, while achieving state-of-the-art and reproducible performances.
Document type :
Conference papers
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download
Contributor : Christophe Servan Connect in order to contact the contributor
Submitted on : Tuesday, December 15, 2015 - 6:46:50 PM
Last modification on : Wednesday, July 6, 2022 - 4:13:01 AM
Long-term archiving on: : Wednesday, March 16, 2016 - 4:20:58 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01244477, version 1


Christophe Servan, Ngoc-Tien Le, Ngoc Quang Luong, Benjamin Lecouteux, Laurent Besacier. An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation. The 12th International Workshop on Spoken Language Translation (IWSLT'15), Dec 2015, Da Nang, Vietnam. ⟨hal-01244477⟩



Record views


Files downloads