Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

Abstract : This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnam-ese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables , words, or their combinations) is discussed .
Keywords : translation
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01393602
Contributor : Brigitte Bigi <>
Submitted on : Friday, December 16, 2016 - 1:07:39 PM
Last modification on : Monday, July 8, 2019 - 3:10:05 PM
Long-term archiving on : Tuesday, March 21, 2017 - 1:38:24 PM

File

W09-0430.pdf
Publisher files allowed on an open archive

Licence


Copyright

Identifiers

Collections

Citation

Thi-Ngoc-Diep Do, Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli. Mining a comparable text corpus for a Vietnamese - French statistical machine translation system. Fourth Workshop on Statistical Machine Translation, 2009, Athens, Greece. pp.165 - 172, ⟨10.3115/1626431.1626466⟩. ⟨hal-01393602⟩

Share

Metrics

Record views

176

Files downloads

204