Skip to Main content Skip to Navigation
Conference papers

Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams

Maroua Bahri 1 Silviu Maniu 2, 3 Albert Bifet 1, 4 Rodrigo Fernandes de Mello 5 Nikolaos Tziortziotis 6
1 DIG - Data, Intelligence and Graphs
LTCI - Laboratoire Traitement et Communication de l'Information
3 VALDA - Value from Data
DI-ENS - Département d'informatique - ENS Paris, Inria de Paris
Abstract : The unbounded and multidimensional nature, the evolution of data distributions with time, and the requirement of singlepass algorithms comprise the main challenges of data stream classification, which makes it impossible to infer learning models in the same manner as for batch scenarios. Data dimensionality reduction arises as a key factor to transform and select only the most relevant features from those streams in order to reduce algorithm space and time demands. In that context, Compressed Sensing (CS) encodes an input signal into lower-dimensional space, guaranteeing its reconstruction up to some distortion factor. This paper employs CS on data streams as a pre-processing step to support a k-Nearest Neighbors (kNN) classification algorithm, one of the most often used algorithms in the data stream mining area-all this while ensuring the key properties of CS hold. Based on topological properties, we show that our classification algorithm also preserves the neighborhood (withing an factor) of kNN after reducing the stream dimensionality with CS. As a consequence, end-users can set an acceptable error margin while performing such projections for kNN. For further improvements, we incorporate this method into an ensemble classifier, Leveraging Bagging, by combining a set of different CS matrices which increases the diversity inside the ensemble. An extensive set of experiments is performed on various datasets, and the results were compared against those yielded by current state-of-the-art approaches, confirming the good performance of our approaches.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03189997
Contributor : Silviu Maniu Connect in order to contact the contributor
Submitted on : Monday, April 5, 2021 - 6:28:44 PM
Last modification on : Friday, January 21, 2022 - 3:17:24 AM
Long-term archiving on: : Tuesday, July 6, 2021 - 6:08:32 PM

File

bahri2020knn.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03189997, version 1

Citation

Maroua Bahri, Silviu Maniu, Albert Bifet, Rodrigo Fernandes de Mello, Nikolaos Tziortziotis. Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams. ECAI 2020 - 24th European Conference on Artificial Intelligence, Aug 2020, Santiago de Compostella / Virtual, Spain. ⟨hal-03189997⟩

Share

Metrics

Les métriques sont temporairement indisponibles