Proper Noun Semantic Clustering using Bag-Of-Vectors
Résumé
In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task. This model is based on a new method to compute the similarity between the entities. In- deed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like cosine. In practice, the fea- tures are contextual ones, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for en- tities, called Bag-Of-Vectors, or Bag-of-Bags-of-Features. In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity. In or- der to use Bag-Of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine, Jac- card and Scalar Products. Experimentally, we show that the Bag-Of-Vectors representa- tion always improve the clustering results compared to clas- sical Bag-Of-Features representations.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...