GOTA: Using the Google Similarity Distance for OLAP Textual Aggregation

Mustapha M Bouakkaz; Sabine Loudcher; Youcef y Ouiten

Communication Dans Un Congrès Année : 2015

GOTA: Using the Google Similarity Distance for OLAP Textual Aggregation

, (1) ,

Mustapha M Bouakkaz

Fonction : Auteur

Sabine Loudcher

Fonction : Auteur
PersonId : 869063
IdHAL : sabine-loudcher
IdRef : 112760937

Equipe de Recherche en Ingénierie des Connaissances

Youcef y Ouiten

Fonction : Auteur

Résumé

With the tremendous growth of unstructured data in the Business Intelligence, there is a need for incorporating textual data into data warehouses, to provide an appropriate multidimensional analysis (OLAP) and develop new approaches that take into account the textual content of data. This will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context. For aggregating keywords, our contribution is to use a data mining technique, such as kmeans, but with a distance based on the Google similarity distance. Thus our approach considers the semantic similarity of keywords for their aggregation. The performance of our approach is analyzed and compared to another method using the k-bisecting clustering algorithm and based on the Jensen-Shannon divergence for the probability distributions. The experimental study shows that our approach achieves better performances in terms of recall, precision,F-measure complexity and runtime.

Mots clés

Google Similrity Textual Data Aggregation Function OLAP

Domaines

Méthodes et statistiques

Sabine Loudcher : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-01136581

Soumis le : vendredi 27 mars 2015-15:32:19

Dernière modification le : vendredi 24 février 2023-12:09:02

Dates et versions

halshs-01136581 , version 1 (27-03-2015)

Identifiants

HAL Id : halshs-01136581 , version 1

Citer

Mustapha M Bouakkaz, Sabine Loudcher, Youcef y Ouiten. GOTA: Using the Google Similarity Distance for OLAP Textual Aggregation. International Conference on Enterprise Information Systems (ICEIS), Apr 2015, Barcelona, Spain. pp.121-127. ⟨halshs-01136581⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LYON2 ERIC LABEXIMU UDL

148 Consultations

0 Téléchargements

GOTA: Using the Google Similarity Distance for OLAP Textual Aggregation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager