An XML Version of Turkish Dictionary

Abstract : In order to anchor international or multinational lexicographic projects on existing Turkish dictionaries, we should have a common understanding of the way we make reference resources available, as is the case for the digital version of our Turkish dictionary project. Although our work has been done on digitizing Turkish dictionaries, both old Turkish dictionaries and current versions of it, these few examples do not follow a standard way of encoding the source file. In order to overcome this obstacle, during the Lexical Data Masterclass in Berlin, on December 4-8, 2017, I worked on an XSLT transformation document to process an existing dictionary into an output conformant to the TEI standard. Seeing that almost all current Turkish dictionaries give the same category of lexical information in a very similar page layout, this XSLT could work on other digitalized or OCRized Turkish dictionaries. Even if they do not have a digital version, GROBID based projects can easily transform OCRized PDFs into digital file format like the works of GROBID-Dictionaries (Khemakhem et al. 2017). The entry structure that I have been using is taken from the Turkish Dictionary published by the Turkish Language Institute (abbreviated as " TDK " in Turkish) which acts as the official authority on the Turkish language (without any enforcement power) and it contributes to linguistic research on Turkish. Almost all mainstream Turkish dictionaries, such as the ones published by Dil Derneği – Language Foundation (Dil Derneği Türkçe Sözlük, 2005) and Arkadaş Publishing (Püsküllüoğlu, 2004), share similarities by means of the lexical information given in the dictionary and also the page layout. The only difference is the order of information given in the microstructure. For example, the order of the etymological and phonological information is different; TDK (Turkish Language Institution) dictionary gives phonological information before the etymological information and the other dictionaries do the opposite; they first give the etymological information and then the phonological information. The rest of the page layout gives the same lexical information in the same order. Therefore, the same XSLT file could handle all Turkish dictionaries to do such a transformation for an appropriate TEI standard.
Complete list of metadatas

Cited literature [1 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01727591
Contributor : Emrah Özcan <>
Submitted on : Friday, March 9, 2018 - 12:29:56 PM
Last modification on : Saturday, June 1, 2019 - 10:54:01 AM
Long-term archiving on : Sunday, June 10, 2018 - 1:50:39 PM

File

XML-Turkish dictionary.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01727591, version 1

Collections

Citation

Emrah Özcan. An XML Version of Turkish Dictionary. Lexical Data Masterclass Participants' Symposium, Dec 2017, Berlin, Germany. ⟨hal-01727591⟩

Share

Metrics

Record views

90

Files downloads

54