Splitting of Compound Terms in non-Prototypical Compounding Languages - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Splitting of Compound Terms in non-Prototypical Compounding Languages

Résumé

Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialized terms. We propose a multi-lingual method of compound recognition and splitting, which uses corpus frequencies, lexical data and optionally linguistic rules. This is a supervised method which requires a small amount of segmented compounds as input. We evaluate the method on two languages that rarely serve as a material for automatic splitting systems: English and Russian. The results obtained are competitive with those of a state-of-the-art corpus-driven approach.
Fichier principal
Vignette du fichier
COLING_hal.pdf (226 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01116134 , version 1 (12-02-2015)

Identifiants

  • HAL Id : hal-01116134 , version 1

Citer

Elizaveta Loginova Clouet, Béatrice Daille. Splitting of Compound Terms in non-Prototypical Compounding Languages. Workshop on Computational Approaches to Compound Analysis, COLING 2014, Aug 2014, Dublin, Ireland. pp.11 - 19. ⟨hal-01116134⟩
101 Consultations
160 Téléchargements

Partager

Gmail Facebook X LinkedIn More