Unsupervised acquisition of morphological resources for Ukrainian

Abstract : Availability of morphological resources is an important and recurrent need because they allow the development of NLP tools and applications for a given language. Indeed, such resources provide basic information which is necessary for such tools for performing more sophisticated treatments (information retrieval, morpho-syntactic tagging, etc). We propose to acquire morphological resources for Ukrainian language. The method proposed exploits corpora in order to extract words that are related morphologically between them. The method has two versions: without and with processing of prefixes. The association strength between these words indicates their probability to have a morphological and semantic relation between them. We use three corpora (literary, medical and general-language) and evaluate the results obtained. According to the corpora, precision varies between 67% and 86%. The results from different corpora are also compared, which shows that there is little redundancy between the copora. The currently available resource contains 3,315 fully validated pairs of words.
Type de document :
Communication dans un congrès
Computational Linguistics and Intelligent Systems, Apr 2017, Kharkiv, Ukraine. 2017
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01736400
Contributeur : Limsi Publications <>
Soumis le : vendredi 16 mars 2018 - 21:01:49
Dernière modification le : mardi 12 février 2019 - 01:30:28

Identifiants

  • HAL Id : hal-01736400, version 1

Citation

Thierry Hamon, Natalia Grabar. Unsupervised acquisition of morphological resources for Ukrainian. Computational Linguistics and Intelligent Systems, Apr 2017, Kharkiv, Ukraine. 2017. 〈hal-01736400〉

Partager

Métriques

Consultations de la notice

65