Morphology based automatic acquisition of large-coverage lexica - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

Morphology based automatic acquisition of large-coverage lexica

Résumé

In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. We first validated our technique by extracting verbs and adjectives on a general French corpus of 25 million words. Compared with other lexical resources available for French, our results are very satisfying, since we cover many words, often derived words, that are not always present in other lexica. Application of our algorithm to the acquisition of domain-specific adjectives on a botanic corpus gave also very good results, thus demonstrating its usability to extract domain-specific lexica. Moreover, it is generalizable to any language with a substantial morphology.

Domaines

Autre [cs.OH]
Fichier principal
Vignette du fichier
LREC04.pdf (57.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00413189 , version 1 (03-09-2009)

Identifiants

  • HAL Id : hal-00413189 , version 1

Citer

Lionel Clément, Bernard Lang, Benoît Sagot. Morphology based automatic acquisition of large-coverage lexica. LREC 04, 2004, Lisbonne, Portugal. pp.1841-1844. ⟨hal-00413189⟩
189 Consultations
136 Téléchargements

Partager

Gmail Facebook X LinkedIn More