Statistical Analysis of Alignment Characteristics for Phrase-based Machine Translation

Abstract : In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. However, there lacks systematic study as to what alignment characteristics can benefit MT under specific experimental settings such as the language pair or the corpus size. In this paper we produce a set of alignments by directly tuning the alignment model according to alignment F-score and BLEU score in order to investigate the alignment characteristics that are helpful in translation. We report results for a phrase-based SMT system on Chinese-to-English IWSLT data, and Spanish-to-English European Parliament data. With a statistical analysis into alignment characteristics that are correlated with BLEU score, we give alignment hints to improve BLEU score using a phrase-based SMT system and different types of corpus.
Document type :
Conference papers
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00525181
Contributor : Patrik Lambert <>
Submitted on : Monday, October 11, 2010 - 1:06:52 PM
Last modification on : Thursday, February 7, 2019 - 5:47:52 PM
Long-term archiving on : Thursday, October 25, 2012 - 4:50:31 PM

File

10_05_eamt_analysisAlCharac.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00525181, version 1

Collections

Citation

Patrik Lambert, Simon Petitrenaud, Yanjun Ma, Andy Way. Statistical Analysis of Alignment Characteristics for Phrase-based Machine Translation. Proceedings of the 14th European Association for Machine Translation, May 2010, Saint-Raphaël, France. no page number. ⟨hal-00525181⟩

Share

Metrics

Record views

303

Files downloads

115