Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

Thi-Ngoc-Diep Do; Viet-Bac Le; Brigitte Bigi; Laurent Besacier; Eric Castelli

doi:10.3115/1626431.1626466

Communication Dans Un Congrès Année : 2009

Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

(1) , (1) , (1) , (1) , (2)

1
2

Thi-Ngoc-Diep Do

Fonction : Auteur
PersonId : 992929

Communication Langagière et Interaction Personne-Système

Viet-Bac Le

Fonction : Auteur

Communication Langagière et Interaction Personne-Système

Brigitte Bigi

Fonction : Auteur
PersonId : 7990
IdHAL : brigittebigi
ORCID : 0000-0003-1834-6918
IdRef : 079410790

Communication Langagière et Interaction Personne-Système

Laurent Besacier

Fonction : Auteur
PersonId : 1521
IdHAL : laurent-besacier
ORCID : 0000-0001-7411-9125
IdRef : 079377017

Communication Langagière et Interaction Personne-Système

Eric Castelli

Fonction : Auteur
PersonId : 750232
IdHAL : eric-castelli
ORCID : 0000-0003-2978-2619
IdRef : 068256256

International Research Institute MICA

Résumé

This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnam-ese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables , words, or their combinations) is discussed .

Mots clés

translation

Domaines

Informatique et langage [cs.CL] Sciences de l'information et de la communication

Fichier principal

W09-0430.pdf (156.21 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Brigitte Bigi : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01393602

Soumis le : vendredi 16 décembre 2016-13:07:39

Dernière modification le : jeudi 4 avril 2024-20:56:54

Archivage à long terme le : mardi 21 mars 2017-13:38:24

Dates et versions

hal-01393602 , version 1 (16-12-2016)

Licence

Identifiants

HAL Id : hal-01393602 , version 1
DOI : 10.3115/1626431.1626466

Citer

Thi-Ngoc-Diep Do, Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli. Mining a comparable text corpus for a Vietnamese - French statistical machine translation system. Fourth Workshop on Statistical Machine Translation, 2009, Athens, Greece. pp.165 - 172, ⟨10.3115/1626431.1626466⟩. ⟨hal-01393602⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA IMAG CNRS POLYTECH-GRENOBLE

122 Consultations

116 Téléchargements

Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager