Using Monolingual Data in Neural Machine Translation: a Systematic Study

Franck Burlot; François Yvon

Communication Dans Un Congrès Année : 2018

Using Monolingual Data in Neural Machine Translation: a Systematic Study

(1, 2) , (2)

1
2

Franck Burlot

Fonction : Auteur
PersonId : 1021079

Lingua Custodia

Traitement du Langage Parlé

François Yvon

Fonction : Auteur
PersonId : 5347
IdHAL : francois-yvon
ORCID : 0000-0002-7972-7442
IdRef : 057593531

Traitement du Langage Parlé

Résumé

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through \textsl{back-translation} - a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.

Mots clés

Machine translation language modeling

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Fichier principal

WMT015.pdf (565.06 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01910235

Soumis le : dimanche 2 décembre 2018-20:38:50

Dernière modification le : samedi 7 octobre 2023-21:36:21

Archivage à long terme le : dimanche 3 mars 2019-12:23:12

Dates et versions

hal-01910235 , version 1 (02-12-2018)

Identifiants

HAL Id : hal-01910235 , version 1

Citer

Franck Burlot, François Yvon. Using Monolingual Data in Neural Machine Translation: a Systematic Study. Conference on Machine Translation, Oct 2018, Brussels, Belgium. ⟨hal-01910235⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-TLP

103 Consultations

242 Téléchargements

Using Monolingual Data in Neural Machine Translation: a Systematic Study

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager