Des dictionnaires éditoriaux aux représentations XML standardisées

Abstract : Create an electronic dictionary from scratch is an expensive job because this task mobilizes over a long period, the work of skilled contributors, if not in lexicology, at least in linguistics. The use of specialized computer tools is essential for resources used by programs in natural language processing. When the socio-economic environment does not gather the necessary resources to the drafting of an electronic dictionary and printed dictionaries exist, these dictionaries are an important resource that can be used to initialize the creation of electronic lexical resources. This paper presents theoretical and practical aspects concerning the conversion of publishing dictionaries to electronic lexical resources. It takes into account the issue of limited economic resources, technology and the availability of qualified persons. Our field experiments concerns under-resourced languages mainly in Southeast Asia (Khmer, Malay, Vietnamese) and the Sahel (Bambara, Hausa, Kanuri, Tamajaq, Zarma), as most of the examples and socio-linguistic situations described in the paper relate to these areas. After a brief history devoted to the formats of electronic dictionaries (SGML, XML, XSLT and CSS), we present two standards that are dedicated to them (Text Encoding Initiative and Lexical Markup Framework). The issue of under-resourced languages is exposed and is followed by some examples concerning published dictionaries. The main technical challenges are detailed like the lack of standardization of the alphabets used and special characters (outside the traditional latin range). The conversion methodology is outlined and then detailed. The conversion to a bridge format in XML can be done by regular expressions or using specialized tools. Then, the bridge format is converted into the target format in LMF. The last part is dedicated to the consultation of resources through an online platform resource management.
keyword : DiLAF LMF XML
Type de document :
Chapitre d'ouvrage
Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00959229
Contributeur : Mathieu Mangeot <>
Soumis le : mardi 1 avril 2014 - 11:22:37
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : mardi 1 juillet 2014 - 10:46:12

Fichier

Livre-Nuria_MM-CE_V15.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00959229, version 1

Citation

Mathieu Mangeot, Chantal Enguehard. Des dictionnaires éditoriaux aux représentations XML standardisées. Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013. 〈hal-00959229〉

Partager

Métriques

Consultations de la notice

277

Téléchargements de fichiers

487