Unsupervised acquisition of morphological resources for Ukrainian

Abstract : Availability of morphological resources is an important and recurrent need because they allow the development of NLP tools and applications for a given language. Indeed, such resources provide basic information which is necessary for such tools for performing more sophisticated treatments (information retrieval, morpho-syntactic tagging, etc). We propose to acquire morphological resources for Ukrainian language. The method proposed exploits corpora in order to extract words that are related morphologically between them. The method has two versions: without and with processing of prefixes. The association strength between these words indicates their probability to have a morphological and semantic relation between them. We use three corpora (literary, medical and general-language) and evaluate the results obtained. According to the corpora, precision varies between 67% and 86%. The results from different corpora are also compared, which shows that there is little redundancy between the copora. The currently available resource contains 3,315 fully validated pairs of words.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01736400
Contributor : Limsi Publications <>
Submitted on : Friday, March 16, 2018 - 9:01:49 PM
Last modification on : Monday, March 18, 2019 - 4:21:28 PM

Identifiers

  • HAL Id : hal-01736400, version 1

Citation

Thierry Hamon, Natalia Grabar. Unsupervised acquisition of morphological resources for Ukrainian. Computational Linguistics and Intelligent Systems, Apr 2017, Kharkiv, Ukraine. 2017. 〈hal-01736400〉

Share

Metrics

Record views

85