Proper Noun Semantic Clustering using Bag-Of-Vectors

Ali-Reza Ebadat; Vincent Claveau; Pascale Sébillot

Communication Dans Un Congrès Année : 2012

Proper Noun Semantic Clustering using Bag-Of-Vectors

(1) , (1) , (1)

Ali-Reza Ebadat

Fonction : Auteur

Multimedia content-based indexing

Vincent Claveau

Fonction : Auteur
PersonId : 5270
IdHAL : vincent-claveau
ORCID : 0000-0002-3459-0550
IdRef : 075988216

Multimedia content-based indexing

Pascale Sébillot

Fonction : Auteur
PersonId : 21840
IdHAL : pascale-sebillot
ORCID : 0000-0002-5429-4302
IdRef : 075988453

Multimedia content-based indexing

Résumé

In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task. This model is based on a new method to compute the similarity between the entities. In- deed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like cosine. In practice, the fea- tures are contextual ones, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for en- tities, called Bag-Of-Vectors, or Bag-of-Bags-of-Features. In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In or- der to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jac- card and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representa- tion always improve the clustering results compared to clas- sical Bag-Of-Features representations.

Domaines

Informatique et langage [cs.CL] Multimédia [cs.MM] Traitement du texte et du document

Fichier principal

Ebadat-Ali-Reza-vf.pdf (175.54 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Vincent Claveau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00760105

Soumis le : lundi 3 décembre 2012-14:48:54

Dernière modification le : vendredi 24 mars 2023-14:52:56

Archivage à long terme le : lundi 4 mars 2013-03:49:51

Dates et versions

hal-00760105 , version 1 (03-12-2012)

Identifiants

HAL Id : hal-00760105 , version 1

Citer

Ali-Reza Ebadat, Vincent Claveau, Pascale Sébillot. Proper Noun Semantic Clustering using Bag-Of-Vectors. ANLP - Applied Natural Language Processing conference. Special track at the 25th International FLAIRS Conference., May 2012, Marco Island, FL, United States. ⟨hal-00760105⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-INSA-R IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

392 Consultations

175 Téléchargements

Proper Noun Semantic Clustering using Bag-Of-Vectors

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager