Skip to Main content Skip to Navigation
Conference papers

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Minh Quang Pham 1 Josep-Maria Crego 2 François Yvon 1 Jean Senellart 2
1 TLP - Traitement du Langage Parlé
LIMSI - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur : 247329
Abstract : Supervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daum\'e III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains. Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources.
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02343215
Contributor : Limsi Publications <>
Submitted on : Saturday, November 2, 2019 - 3:02:59 PM
Last modification on : Monday, February 10, 2020 - 6:14:12 PM
Document(s) archivé(s) le : Monday, February 3, 2020 - 1:58:23 PM

File

IWSLT2019_paper_10.pdf
Files produced by the author(s)

Identifiers

Citation

Minh Quang Pham, Josep-Maria Crego, François Yvon, Jean Senellart. Generic and Specialized Word Embeddings for Multi-Domain Machine Translation. International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China. ⟨10.5281/zenodo.3524978⟩. ⟨hal-02343215⟩

Share

Metrics

Record views

79

Files downloads

108