Statistical Post-Editing of Machine Translation for Domain Adaptation

Abstract : This paper presents a statistical approach to adapt out-of-domain machine translation systems to the medical domain through an unsupervised post-editing step. A statistical post-editing model is built on statistical machine translation (SMT) outputs aligned with their translation references. Evaluations carried out to translate medical texts from French to English show that an out-of-domain machine translation system can be adapted a posteri-ori to a specific domain. Two SMT systems are studied: a state-of-the-art phrase-based implementation and an online publicly available system. Our experiments also indicate that selecting sentences for post-editing leads to significant improvements of translation quality and that more gains are still possible with respect to an oracle measure.
Document type :
Conference papers
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01320242
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Wednesday, February 27, 2019 - 10:32:29 AM
Last modification on : Wednesday, May 15, 2019 - 10:12:03 AM
Long-term archiving on : Tuesday, May 28, 2019 - 1:01:17 PM

File

EAMT12.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01320242, version 1

Collections

Citation

Raphaël Rubino, Stéphane Huet, Fabrice Lefèvre, Georges Linarès. Statistical Post-Editing of Machine Translation for Domain Adaptation. 16th Annual Conference of the European Association for Machine Translation (EAMT), May 2012, Trento, Italy. ⟨hal-01320242⟩

Share

Metrics

Record views

48

Files downloads

7