Skip to Main content Skip to Navigation
Conference papers

Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams

Maroua Bahri 1, 2 Bernhard Pfahringer 3 Albert Bifet 3, 1, 2 Silviu Maniu 4, 5
1 DIG - Data, Intelligence and Graphs
LTCI - Laboratoire Traitement et Communication de l'Information
5 VALDA - Value from Data
DI-ENS - Département d'informatique - ENS Paris, Inria de Paris
Abstract : Learning from potentially infinite and high-dimensional data streams poses significant challenges in the classification task. For instance, k-Nearest Neighbors (kNN) is one of the most often used algorithms in the data stream mining area that proved to be very resource-intensive when dealing with high-dimensional spaces. Uniform Manifold Approximation and Projection (UMAP) is a novel manifold technique and one of the most promising dimension reduction and visualization techniques in the non-streaming setting because of its high performance in comparison with competitors. However, there is no version of UMAP that copes with the challenging context of streams. To overcome these restrictions, we propose a batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification. Experiments conducted on publicly available synthetic and real-world datasets demonstrate the substantial gains that can be achieved with our proposal compared to state-of-the-art techniques.
Complete list of metadata
Contributor : Silviu Maniu Connect in order to contact the contributor
Submitted on : Monday, April 5, 2021 - 8:04:25 PM
Last modification on : Friday, January 21, 2022 - 3:17:20 AM
Long-term archiving on: : Tuesday, July 6, 2021 - 6:09:52 PM


Files produced by the author(s)



Maroua Bahri, Bernhard Pfahringer, Albert Bifet, Silviu Maniu. Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams. IDA 2020 - 18th International Symposium on Intelligent Data Analysis, Apr 2020, Konstanz / Virtual, Germany. pp.40-53, ⟨10.1007/978-3-030-44584-3_4⟩. ⟨hal-03190032⟩



Les métriques sont temporairement indisponibles