OLAP Textual Aggregation Approach using the Google Similarity Distance - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International Journal of Business Intelligence and Data Mining Année : 2016

OLAP Textual Aggregation Approach using the Google Similarity Distance

Résumé

Data warehousing and On-Line Analytical Processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will highlight aggregates semantically richer than those provided by classical OLAP operators. The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation. The performance of our approach is analyzed and compared to other methods such as Topkeywords, TOPIC, TuBE and BienCube. The experimental study shows that our approach achieves better performances in terms of recall, precision,F-measure complexity and runtime.
Fichier non déposé

Dates et versions

halshs-01231490 , version 1 (20-11-2015)

Identifiants

  • HAL Id : halshs-01231490 , version 1

Citer

Mustapha M Bouakkaz, Sabine Loudcher, Youcef Ouinten. OLAP Textual Aggregation Approach using the Google Similarity Distance. International Journal of Business Intelligence and Data Mining, 2016, 11 (1), pp.31-48. ⟨halshs-01231490⟩
64 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More