An XML Version of Turkish Dictionary - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

An XML Version of Turkish Dictionary

Emrah Özcan

Résumé

In order to anchor international or multinational lexicographic projects on existing Turkish dictionaries, we should have a common understanding of the way we make reference resources available, as is the case for the digital version of our Turkish dictionary project. Although our work has been done on digitizing Turkish dictionaries, both old Turkish dictionaries and current versions of it, these few examples do not follow a standard way of encoding the source file. In order to overcome this obstacle, during the Lexical Data Masterclass in Berlin, on December 4-8, 2017, I worked on an XSLT transformation document to process an existing dictionary into an output conformant to the TEI standard. Seeing that almost all current Turkish dictionaries give the same category of lexical information in a very similar page layout, this XSLT could work on other digitalized or OCRized Turkish dictionaries. Even if they do not have a digital version, GROBID based projects can easily transform OCRized PDFs into digital file format like the works of GROBID-Dictionaries (Khemakhem et al. 2017). The entry structure that I have been using is taken from the Turkish Dictionary published by the Turkish Language Institute (abbreviated as " TDK " in Turkish) which acts as the official authority on the Turkish language (without any enforcement power) and it contributes to linguistic research on Turkish. Almost all mainstream Turkish dictionaries, such as the ones published by Dil Derneği – Language Foundation (Dil Derneği Türkçe Sözlük, 2005) and Arkadaş Publishing (Püsküllüoğlu, 2004), share similarities by means of the lexical information given in the dictionary and also the page layout. The only difference is the order of information given in the microstructure. For example, the order of the etymological and phonological information is different; TDK (Turkish Language Institution) dictionary gives phonological information before the etymological information and the other dictionaries do the opposite; they first give the etymological information and then the phonological information. The rest of the page layout gives the same lexical information in the same order. Therefore, the same XSLT file could handle all Turkish dictionaries to do such a transformation for an appropriate TEI standard.
Fichier principal
Vignette du fichier
XML-Turkish dictionary.pdf (61.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01727591 , version 1 (09-03-2018)

Identifiants

  • HAL Id : hal-01727591 , version 1

Citer

Emrah Özcan. An XML Version of Turkish Dictionary. Lexical Data Masterclass Participants' Symposium, Dec 2017, Berlin, Germany. ⟨hal-01727591⟩
131 Consultations
57 Téléchargements

Partager

Gmail Facebook X LinkedIn More