Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Hai-Heu Vu; Jeanne Villaneau; Farida Saïd; Pierre-François Marteau

Communication Dans Un Congrès Année : 2014

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

(1) , (1) , (2) , (1)

1
2

Hai-Heu Vu

Fonction : Auteur

Expressiveness in Human Centered Data/Media

Jeanne Villaneau

Fonction : Auteur
PersonId : 171161
IdHAL : jeanne-villaneau
ORCID : 0000-0003-1564-1436
IdRef : 077649338

Expressiveness in Human Centered Data/Media

Farida Saïd

Fonction : Auteur
PersonId : 2104
IdHAL : farida-said
ORCID : 0000-0002-8670-9584
IdRef : 194062813

Laboratoire de Mathématiques de Bretagne Atlantique

Pierre-François Marteau

Fonction : Auteur
PersonId : 219
IdHAL : pierre-francois-marteau
ORCID : 0000-0002-3963-8795
IdRef : 033981124

Expressiveness in Human Centered Data/Media

Résumé

We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, ROUGE . We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. In both domains, ESA and ROUGE perform better when they are mixed than they do individually. Besides, using the whole Wikipedia base in ESA did not prove necessary since the best results were obtained with a low number of well selected concepts.

Mots clés

sentence similarity summarization

Domaines

Informatique et langage [cs.CL]

Jeanne Villaneau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01066170

Soumis le : vendredi 19 septembre 2014-12:08:16

Dernière modification le : vendredi 3 mai 2024-13:42:54

Dates et versions

hal-01066170 , version 1 (19-09-2014)

Identifiants

HAL Id : hal-01066170 , version 1

Citer

Hai-Heu Vu, Jeanne Villaneau, Farida Saïd, Pierre-François Marteau. Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams. Text, Speech and Dialogue, Sep 2014, Brno, Czech Republic. pp.201-208. ⟨hal-01066170⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA INSMI UBS LMBA_UBS IRISA-D6 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES IBNM UR1-MATH-NUM

285 Consultations

0 Téléchargements

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager