Skip to Main content Skip to Navigation
Journal articles

Addressing Data Sparsity for Neural Machine Translation Between Morphologically Rich Languages

Abstract : Translating between morphologically rich languages is still challenging for actual machine translation systems. In this paper, we experiment with various Neural Machine Translation (NMT) architectures to address the data sparsity problem caused by data availability (quantity), domain shift and the languages involved (Arabic and French). We showed that the Factored NMT (FNMT) model, which uses linguistically motivated factors, is able to outperform standard NMT systems using subword units by more than 1% BLEU point even when a large quantity of data is available. Our work shows the benefits of applying linguistic factors in NMT when faced with low and large resource conditions.
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02505563
Contributor : Loïc Barrault <>
Submitted on : Wednesday, March 11, 2020 - 3:47:51 PM
Last modification on : Friday, March 13, 2020 - 1:22:40 AM

File

journal_MT.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02505563, version 1

Collections

Citation

Mercedes Garcia-Martinez, Loïc Barrault, Fethi Bougares. Addressing Data Sparsity for Neural Machine Translation Between Morphologically Rich Languages. Machine Translation, Springer Verlag, 2020. ⟨hal-02505563⟩

Share

Metrics

Record views

17

Files downloads

20