EvoEvo Deliverable 5.1: Impact obtained from EvoEvo mechanisms on data stream cluster analysis

Guillaume Beslon 1, 2 Jonas Abernot 1 Sergio Peignier 2, 1 Christophe Rigotti 3, 2, 1
1 BEAGLE - Artificial Evolution and Computational Biology
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive, CarMeN - Laboratoire de recherche en cardiovasculaire, métabolisme, diabétologie et nutrition
3 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Subspace clustering is a data mining task that searches for objects that share similar features and at the same time looks for the subspaces where these similarities appear. For this reason Subspace clustering is recognized as more general and complicated than standard clustering, since this last task requires only to detect groups of similar objects or clusters. In this report we present ChameleoClust + , an evolutionary algorithm to tackle the subspace clustering problem. ChameleoClust + is a bio-inspired algorithm implementing an evolvable genome structure, including several bio-like features such as a variable genome length, both functional and non-functional elements and mutation operators including chromosomal rearrangements. The main purpose of the design of ChameleoClust + is to take advantage of the large degree of freedom provided by its evolvable structure to detect various number of clusters in subspaces of various dimensions. This algorithm was assessed and compared to the state of the art methods, with satisfying results, on a reference benchmark using both real world and synthetic datasets. While other algorithms may need more complex parameter setting, ChameleoClust + needs to set only one sub-space clustering ad-hoc parameter: the maximal number of clusters. This single parameter is responsible for setting the maximal level of detail of the subspace clustering, and is a quite intuitive parameter. The remaining parameters of ChameleoClust+ are related to the evolution strategy (population size, mutation rate, ...) and it is possible to use a single setting for them, that turns out to be effective enough for all the benchmark datasets. A sensitivity analysis has also been carried out to study the impact of each parameter on the subspace clustering quality. This report also presents Evowave, an application of ChameleoClust+ to analyze a real dynamic stream.
Type de document :
Rapport
[Research Report] INRIA Grenoble - Rhône-Alpes. 2016
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01577177
Contributeur : Guillaume Beslon <>
Soumis le : vendredi 25 août 2017 - 00:35:41
Dernière modification le : jeudi 9 novembre 2017 - 14:32:11

Fichier

evoevo-deliverable-d5.1.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01577177, version 1

Collections

Citation

Guillaume Beslon, Jonas Abernot, Sergio Peignier, Christophe Rigotti. EvoEvo Deliverable 5.1: Impact obtained from EvoEvo mechanisms on data stream cluster analysis. [Research Report] INRIA Grenoble - Rhône-Alpes. 2016. 〈hal-01577177〉

Partager

Métriques

Consultations de la notice

150

Téléchargements de fichiers

17