Skip to Main content Skip to Navigation
Conference papers

FreDist: Automatic construction of distributional thesauri for French

Abstract : In this article we present FreDist, a freely available software package for the automatic construction of distributional thesauri from text corpora, as well as an evaluation of various distributional similarity metrics for French. Following from the work of Lin (1998) and Curran (2004), we use a large corpus of journalistic text and implement different choices for the type of lexical context relation, the weight function, and the measure function needed to build a distributional thesaurus. Using the EuroWordNet and \wolf wordnet resources for French as gold-standard references for our evaluation, we obtain the novel result that combining bigram and syntactic dependency context relations results in higher quality distributional thesauri. In addition, we hope that our software package and a joint release of our best thesauri for French will be useful to the NLP community.
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00602004
Contributor : Enrique Henestroza Anguiano <>
Submitted on : Tuesday, June 21, 2011 - 11:56:56 AM
Last modification on : Friday, March 27, 2020 - 2:55:27 AM
Document(s) archivé(s) le : Thursday, September 22, 2011 - 2:22:54 AM

File

henestroza2011fredist.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00602004, version 1

Collections

Citation

Enrique Henestroza Anguiano, Pascal Denis. FreDist: Automatic construction of distributional thesauri for French. TALN - 18ème conférence sur le traitement automatique des langues naturelles, Jun 2011, Montpellier, France, France. pp.119--124. ⟨hal-00602004⟩

Share

Metrics

Record views

383

Files downloads

175