Multilingual compound splitting combining language dependent and independent features - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Multilingual compound splitting combining language dependent and independent features

Résumé

Compounding is a common phenomenon for many languages, especially those with a rich morphology. Dealing with compounds is a challenge for the natural language processing (NLP) systems since compounds are not often included in the dictionaries and other lexical sources. In this paper, we present a compound splitting method combining language independent features (similarity measure, corpus data) and language specific component transformation rules. Due to the usage of language independent features, the method can be applied to different languages. We report on our experiments in splitting of German and Russian compound words, giving positive results compared to matching of compound parts in a lexicon. Even if to our knowledge elaborated compound splitting is rarely a component of NLP systems for Russian language, our experiments show that it could be beneficial to treat specialized vocabulary.
Fichier principal
Vignette du fichier
LoginovaE.A_DailleB_DIALOG.pdf (127.76 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00920323 , version 1 (18-12-2013)

Identifiants

  • HAL Id : hal-00920323 , version 1

Citer

Elizaveta Loginova Clouet, Béatrice Daille. Multilingual compound splitting combining language dependent and independent features. Dialogue, May 2013, Moscou, Russia. pp.455-463. ⟨hal-00920323⟩
170 Consultations
272 Téléchargements

Partager

Gmail Facebook X LinkedIn More