Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Minh Quang Pham; Josep-Maria Crego; François Yvon; Jean Senellart

doi:10.5281/zenodo.3524978

Communication Dans Un Congrès Année : 2019

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

(1) , (2) , (1) , (2)

1
2

Minh Quang Pham

Fonction : Auteur
PersonId : 176069
IdHAL : pham-minhquang
ORCID : 0000-0003-3618-481X
IdRef : 259423432

Traitement du Langage Parlé

Josep-Maria Crego

Fonction : Auteur
PersonId : 1038144

SYSTRAN

François Yvon

Fonction : Auteur
PersonId : 5347
IdHAL : francois-yvon
ORCID : 0000-0002-7972-7442
IdRef : 057593531

Traitement du Langage Parlé

Jean Senellart

Fonction : Auteur
PersonId : 1038145

SYSTRAN

Résumé

Supervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daum\'e III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains. Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources.

Mots clés

Machine Translation Domain Adaptation

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Fichier principal

IWSLT2019_paper_10.pdf (328.18 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02343215

Soumis le : samedi 2 novembre 2019-15:02:59

Dernière modification le : samedi 7 octobre 2023-21:36:21

Archivage à long terme le : lundi 3 février 2020-13:58:23

Dates et versions

hal-02343215 , version 1 (02-11-2019)

Identifiants

HAL Id : hal-02343215 , version 1
DOI : 10.5281/zenodo.3524978

Citer

Minh Quang Pham, Josep-Maria Crego, François Yvon, Jean Senellart. Generic and Specialized Word Embeddings for Multi-Domain Machine Translation. International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China. ⟨10.5281/zenodo.3524978⟩. ⟨hal-02343215⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-TLP

654 Consultations

267 Téléchargements

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager