Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Abstract : We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, ROUGE . We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. In both domains, ESA and ROUGE perform better when they are mixed than they do individually. Besides, using the whole Wikipedia base in ESA did not prove necessary since the best results were obtained with a low number of well selected concepts.
Type de document :
Communication dans un congrès
Springer. Text, Speech and Dialogue, Sep 2014, Brno, Czech Republic. 8655, pp.201-208, 2014
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01066170
Contributeur : Jeanne Villaneau <>
Soumis le : vendredi 19 septembre 2014 - 12:08:16
Dernière modification le : mercredi 16 mai 2018 - 11:24:07

Identifiants

  • HAL Id : hal-01066170, version 1

Citation

Hai-Heu Vu, Jeanne Villaneau, Farida Saïd, Pierre-François Marteau. Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams. Springer. Text, Speech and Dialogue, Sep 2014, Brno, Czech Republic. 8655, pp.201-208, 2014. 〈hal-01066170〉

Partager

Métriques

Consultations de la notice

412