FreDist: Automatic construction of distributional thesauri for French - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

FreDist: Automatic construction of distributional thesauri for French

Résumé

In this article we present FreDist, a freely available software package for the automatic construction of distributional thesauri from text corpora, as well as an evaluation of various distributional similarity metrics for French. Following from the work of Lin (1998) and Curran (2004), we use a large corpus of journalistic text and implement different choices for the type of lexical context relation, the weight function, and the measure function needed to build a distributional thesaurus. Using the EuroWordNet and \wolf wordnet resources for French as gold-standard references for our evaluation, we obtain the novel result that combining bigram and syntactic dependency context relations results in higher quality distributional thesauri. In addition, we hope that our software package and a joint release of our best thesauri for French will be useful to the NLP community.
Fichier principal
Vignette du fichier
henestroza2011fredist.pdf (77.96 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00602004 , version 1 (21-06-2011)

Identifiants

  • HAL Id : hal-00602004 , version 1

Citer

Enrique Henestroza Anguiano, Pascal Denis. FreDist: Automatic construction of distributional thesauri for French. TALN - 18ème conférence sur le traitement automatique des langues naturelles, Jun 2011, Montpellier, France, France. pp.119--124. ⟨hal-00602004⟩
225 Consultations
122 Téléchargements

Partager

Gmail Facebook X LinkedIn More