Semantic Clustering using Bag-of-Bag-of-Features

Ali-Reza Ebadat; Vincent Claveau; Pascale Sébillot

Communication Dans Un Congrès Année : 2012

Semantic Clustering using Bag-of-Bag-of-Features

(1) , (1) , (1)

Ali-Reza Ebadat

Fonction : Auteur

Multimedia content-based indexing

Vincent Claveau

Fonction : Auteur
PersonId : 5270
IdHAL : vincent-claveau
ORCID : 0000-0002-3459-0550
IdRef : 075988216

Multimedia content-based indexing

Pascale Sébillot

Fonction : Auteur
PersonId : 21840
IdHAL : pascale-sebillot
ORCID : 0000-0002-5429-4302
IdRef : 075988453

Multimedia content-based indexing

Résumé

Computing distances between textual representation is at the heart of many Natural Language Processing tasks. The standard approaches initially developed for Information Retrieval are then used; most often they rely on a bag-of-words (or bag-of-feature) description with a TF-IDF (or variants) weighting, a vectorial representation and classical similarity functions like cosine. In this paper, we are interested in such a task, namely the semantic clustering of entities extracted from a text. We argue that for this kind of tasks, more suited representations and similarity measures can be used. In particular, we explore the use of alternative representation for entities called Bag-Of-Vectors (or Bag-of-Bags-of-Features). In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In order to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jaccard and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representation always improve the clustering results compared to classical Bag-Of-Features representations.

Mots clés

vector representation bag-of-bag-of-words bag-of-vectors similarity clustering

Domaines

Traitement du texte et du document

Fichier principal

CORIA.pdf (283.02 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pascale Sébillot : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00753912

Soumis le : lundi 19 novembre 2012-19:35:35

Dernière modification le : vendredi 24 mars 2023-14:52:56

Archivage à long terme le : jeudi 21 février 2013-11:45:28

Dates et versions

hal-00753912 , version 1 (19-11-2012)

Identifiants

HAL Id : hal-00753912 , version 1

Citer

Ali-Reza Ebadat, Vincent Claveau, Pascale Sébillot. Semantic Clustering using Bag-of-Bag-of-Features. CORIA - COnférence en Recherche d'Information et Applications, Mar 2012, Bordeaux, France. pp.229-244. ⟨hal-00753912⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-INSA-R IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

397 Consultations

341 Téléchargements

Semantic Clustering using Bag-of-Bag-of-Features

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager