The Hierarchical Agglomerative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context

Abstract : The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset. One focus of the current research on OLAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases. Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering methods are often used by OLAP system designer. In this article, we propose using hierarchical agglomerative clustering with a metric that comes from ecological studies (the Gower similarity index) to build automatically hierarchical dimensions in an OLAP cube. With this similarity index we can perform a hierarchical clustering on heterogeneous datasets that contains qualitative and quantitative variables. We offer a prototypical automatic system which builds dimension for an OLAP cube and we measure the performances of this system according to the number of clustered individuals and according to the number of variables used for clustering. Thanks to these measures we can offer an approximation of performances with a large dataset. Thereby the Gower index in a hierarchical agglomerative clustering permits the management of heterogeneous dataset with missing values in a context of automatic building of OLAP cube. With this methodology, we can build new dimensions based on hierarchies in the data, which are not evident. The data mining methods can complete the expert knowledge during the design of an OLAP cube, because these methods can explain the inherent structure of the data.
Type de document :
Article dans une revue
Ecological Informatics, Elsevier, 2014, pp.1-14. <10.1016/j.ecoinf.2014.07.011>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01060817
Contributeur : Ludovic Journaux <>
Soumis le : vendredi 12 septembre 2014 - 11:34:05
Dernière modification le : vendredi 6 janvier 2017 - 14:28:40
Document(s) archivé(s) le : samedi 13 décembre 2014 - 10:09:58

Fichier

ecological_inf.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

LE2I | BD | PAM

Citation

Lucile Sautot, Bruno Faivre, Ludovic Journaux, Paul Molin. The Hierarchical Agglomerative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context. Ecological Informatics, Elsevier, 2014, pp.1-14. <10.1016/j.ecoinf.2014.07.011>. <hal-01060817>

Partager

Métriques

Consultations de
la notice

189

Téléchargements du document

254