Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Abstract : We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, ROUGE . We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. In both domains, ESA and ROUGE perform better when they are mixed than they do individually. Besides, using the whole Wikipedia base in ESA did not prove necessary since the best results were obtained with a low number of well selected concepts.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01066170
Contributor : Jeanne Villaneau <>
Submitted on : Friday, September 19, 2014 - 12:08:16 PM
Last modification on : Tuesday, July 9, 2019 - 4:54:02 PM

Identifiers

  • HAL Id : hal-01066170, version 1

Citation

Hai-Heu Vu, Jeanne Villaneau, Farida Saïd, Pierre-François Marteau. Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams. Text, Speech and Dialogue, Sep 2014, Brno, Czech Republic. pp.201-208. ⟨hal-01066170⟩

Share

Metrics

Record views

541