Unsupervised Feature Selection with Ensemble Learning

Haytham Elghazel 1 Alex Aussem 1
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In this paper, we show that the way internal estimates are used to measure variable importance in Random Forests are also applicable to feature selection in unsupervised learning. We propose a new method called Random Cluster Ensemble (RCE for short), that estimates the out-of-bag feature importance from an ensemble of partitions. Each partition is constructed using a different bootstrap sample and a random subset of the features. We provide empirical results on nineteen benchmark data sets indicating that RCE, boosted with a recursive feature elimination scheme (RFE), can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art supervised and unsupervised algorithms, with a very limited subset of features. The method shows promise to deal with very large domains. All results, datasets and algorithms are available on line.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01339161
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Wednesday, June 29, 2016 - 3:47:18 PM
Last modification on : Thursday, November 21, 2019 - 2:28:22 AM

Identifiers

Citation

Haytham Elghazel, Alex Aussem. Unsupervised Feature Selection with Ensemble Learning. Machine Learning, Springer Verlag, 2015, 98 (1-2), pp.157-180. ⟨10.1007/s10994-013-5337-8⟩. ⟨hal-01339161⟩

Share

Metrics

Record views

193