Mining a comparable text corpus for a Vietnamese - French statistical machine translation system - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

Résumé

This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnam-ese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables , words, or their combinations) is discussed .

Mots clés

Fichier principal
Vignette du fichier
W09-0430.pdf (156.21 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-01393602 , version 1 (16-12-2016)

Licence

Copyright (Tous droits réservés)

Identifiants

Citer

Thi-Ngoc-Diep Do, Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli. Mining a comparable text corpus for a Vietnamese - French statistical machine translation system. Fourth Workshop on Statistical Machine Translation, 2009, Athens, Greece. pp.165 - 172, ⟨10.3115/1626431.1626466⟩. ⟨hal-01393602⟩
122 Consultations
116 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More