Reordering Space Design in Statistical Machine Translation - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Language Resources and Evaluation Année : 2016

Reordering Space Design in Statistical Machine Translation

Résumé

In Statistical Machine Translation (SMT), the constraints on word reorderings have a great impact on the set of potential translations that is explored during search. Notwithstanding computational issues, the reordering space of a SMT system needs to be designed with great care: if a larger search space is likely to yield better translations, it may also lead to more decoding errors, because of the added ambiguity and the interaction with the pruning strategy. In this paper, we study the reordering search space, using a state-of-the art translation system, where all reorderings are represented in a permutation lattice prior to decoding. This allows us to directly explore and compare different reordering schemes and oracle settings. We also study in detail a rule-based preordering system, varying the length and number of rules, the tagset used, as well as contrasting with purely combinatorial subsets of permutations. We carry out experiments on three language pairs in both directions: English-French, a close language pair; English-German and English-Czech, two much more challenging pairs. We show that even though it might be desirable to design better reordering spaces, model and search errors seem to be the most important issues. Therefore, improvements of the reordering space should come along with improvements of the associated models to be really effective.
Fichier principal
Vignette du fichier
final_submission-aspreprint.pdf (444.55 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Licence : Copyright (Tous droits réservés)

Dates et versions

hal-01620902 , version 1 (02-04-2024)

Identifiants

Citer

Nicolas Pécheux, Alexandre Allauzen, Jan Niehues, François Yvon. Reordering Space Design in Statistical Machine Translation. Language Resources and Evaluation, 2016, 50, pp.375-410. ⟨10.1007/s10579-016-9353-8⟩. ⟨hal-01620902⟩
97 Consultations
1 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More