EvoEvo Deliverable 5.1: Impact obtained from EvoEvo mechanisms on data stream cluster analysis

Guillaume Beslon 1, 2 Jonas Abernot 1 Sergio Peignier 2, 1 Christophe Rigotti 3, 2, 1
1 BEAGLE - Artificial Evolution and Computational Biology
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive - UMR 5558
3 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Subspace clustering is a data mining task that searches for objects that share similar features and at the same time looks for the subspaces where these similarities appear. For this reason Subspace clustering is recognized as more general and complicated than standard clustering, since this last task requires only to detect groups of similar objects or clusters. In this report we present ChameleoClust + , an evolutionary algorithm to tackle the subspace clustering problem. ChameleoClust + is a bio-inspired algorithm implementing an evolvable genome structure, including several bio-like features such as a variable genome length, both functional and non-functional elements and mutation operators including chromosomal rearrangements. The main purpose of the design of ChameleoClust + is to take advantage of the large degree of freedom provided by its evolvable structure to detect various number of clusters in subspaces of various dimensions. This algorithm was assessed and compared to the state of the art methods, with satisfying results, on a reference benchmark using both real world and synthetic datasets. While other algorithms may need more complex parameter setting, ChameleoClust + needs to set only one sub-space clustering ad-hoc parameter: the maximal number of clusters. This single parameter is responsible for setting the maximal level of detail of the subspace clustering, and is a quite intuitive parameter. The remaining parameters of ChameleoClust+ are related to the evolution strategy (population size, mutation rate, ...) and it is possible to use a single setting for them, that turns out to be effective enough for all the benchmark datasets. A sensitivity analysis has also been carried out to study the impact of each parameter on the subspace clustering quality. This report also presents Evowave, an application of ChameleoClust+ to analyze a real dynamic stream.
Document type :
Reports
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01577177
Contributor : Guillaume Beslon <>
Submitted on : Friday, August 25, 2017 - 12:35:41 AM
Last modification on : Wednesday, November 20, 2019 - 3:06:25 AM

File

evoevo-deliverable-d5.1.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01577177, version 1

Citation

Guillaume Beslon, Jonas Abernot, Sergio Peignier, Christophe Rigotti. EvoEvo Deliverable 5.1: Impact obtained from EvoEvo mechanisms on data stream cluster analysis. [Research Report] INRIA Grenoble - Rhône-Alpes. 2016. ⟨hal-01577177⟩

Share

Metrics

Record views

364

Files downloads

202