Building Specialized Multilingual Lexical Graphs Using Community Resources

Abstract : We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users' behaviors to extract interesting patterns and facts (implicit approach). As a generic repository that can handle the collected multilingual terminological data, we are describing the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain. We call it preterminological, because it is a raw material that can be used to build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.
Mots-clés : MPG
Type de document :
Chapitre d'ouvrage
Lacroix, Zoé. Resource Discovery, Springer, Berlin/Heidelberg, pp.94-109, 2010, Lecture Notes in Computer Science, 〈10.1007/978-3-642-14415-8_7〉
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00969200
Contributeur : Mathieu Mangeot <>
Soumis le : mercredi 2 avril 2014 - 11:18:44
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : mercredi 2 juillet 2014 - 11:50:41

Fichier

RED09_MD_CB_AK_MM.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Mohammad Daoud, Christian Boitet, Kyo Kageura, Asanobu Kitamoto, Mathieu Mangeot, et al.. Building Specialized Multilingual Lexical Graphs Using Community Resources. Lacroix, Zoé. Resource Discovery, Springer, Berlin/Heidelberg, pp.94-109, 2010, Lecture Notes in Computer Science, 〈10.1007/978-3-642-14415-8_7〉. 〈hal-00969200〉

Partager

Métriques

Consultations de la notice

227

Téléchargements de fichiers

140