Ajuster l'analyse distributionnelle à un corpus spécialisé de petite taille

Abstract : Applying distributional semantic models to medium-size specialized corpora is an important objective for the extraction of lexical and terminological ressources. In this context, we seek to optimize the distributional analysis procedure on a 2 million word corpus consisting of NLP conference proceedings. Our expertise in this field allows us to establish a relevant benchmark for the task, thus providing an ideal experimental setup to observe the distributional mechanisms at work. We test several hundred configurations, with parameters ranging from syntactic analysis to similarity measures. This study highlights the variety of the results, particularly according to the POS of the target words, and allows for the identification of the best performing configurations by varying the number, nature and type of the contexts considered.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01022171
Contributor : Franck Sajous <>
Submitted on : Friday, July 11, 2014 - 9:19:21 AM
Last modification on : Wednesday, July 10, 2019 - 1:33:33 AM
Long-term archiving on : Saturday, October 11, 2014 - 10:46:05 AM

File

FabreEtAl2014b-SemDis-TALN.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01022171, version 1

Collections

Citation

Cécile Fabre, Nabil Hathout, Franck Sajous, Ludovic Tanguy. Ajuster l'analyse distributionnelle à un corpus spécialisé de petite taille. 21e Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2014), Jun 2014, Marseille, France. pp.266-279. ⟨hal-01022171⟩

Share

Metrics

Record views

473

Files downloads

386