A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting

Abstract : Seriation is a useful statistical method to visualize clusters in a dataset. However, as the data are noisy or unbalanced, visualizing the data structure becomes challenging. To alleviate this limitation, we introduce a novel metric based on common neighborhood to evaluate the degree of sparsity in a dataset. A pile of matrices are derived for different levels of sparsity, and the matrices are permuted by a branch-and-bound algorithm. The matrix with the best block diagonal form is then selected by a compactness criterion. The selected matrix reveals the intrinsic structure of the data by excluding noisy data or outliers. This seriation algorithm is applicable even if the number of clusters is unknown or if the clusters are imbalanced. However, if the metric introduces too much sparsity in the data, the sub-sampled groups of data could be ousted. To resolve this problem, a multi-scale approach combining different levels of sparsity is proposed. The capability of the proposed seriation method is examined both by toy problems and in the context of spike sorting.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01133002
Contributor : Frédéric Davesne <>
Submitted on : Wednesday, March 18, 2015 - 12:52:03 PM
Last modification on : Monday, October 28, 2019 - 10:50:21 AM

Identifiers

Citation

Vincent Vigneron, Hsin Chen. A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting. Pattern Analysis and Applications, Springer Verlag, 2016, 19 (4), pp.885--903. ⟨10.1007/s10044-015-0458-2⟩. ⟨hal-01133002⟩

Share

Metrics

Record views

143