A syntactic component for Vietnamese language processing

Phuong Le-Hong 1 Azim Roussanaly 2 Thi Minh Huen Nguyen 1
2 KIWI - Knowledge Information and Web Intelligence
LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
Abstract : This paper presents the development of a grammar and a syntactic parser for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system that produces syntactic structures for Vietnamese sentences. A dependency annotation scheme for Vietnamese and an algorithm for extracting dependency structures from derivation trees are also proposed. This is the first Vietnamese parsing system capable of producing both constituency and dependency analyses. It offers encouraging performance: accuracy of 69.33% and 73.21% for constituency and dependency analysis, respectively.
Type de document :
Article dans une revue
Journal of Language Modelling, Institute of Computer Science, Polish Academy of Sciences, Poland, 2015, Journal of Language Modelling, 3 (1), pp.146-184. 〈10.15398/jlm.v3i1.89〉
Liste complète des métadonnées

Littérature citée [43 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01255977
Contributeur : Azim Roussanaly <>
Soumis le : lundi 25 janvier 2016 - 11:40:00
Dernière modification le : mardi 24 avril 2018 - 13:51:19
Document(s) archivé(s) le : mardi 26 avril 2016 - 10:14:02

Fichier

89-838-1-PB.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Phuong Le-Hong, Azim Roussanaly, Thi Minh Huen Nguyen. A syntactic component for Vietnamese language processing. Journal of Language Modelling, Institute of Computer Science, Polish Academy of Sciences, Poland, 2015, Journal of Language Modelling, 3 (1), pp.146-184. 〈10.15398/jlm.v3i1.89〉. 〈hal-01255977〉

Partager

Métriques

Consultations de la notice

170

Téléchargements de fichiers

235