A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Pattern Analysis and Applications Année : 2016

A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting

Résumé

Seriation is a useful statistical method to visualize clusters in a dataset. However, as the data are noisy or unbalanced, visualizing the data structure becomes challenging. To alleviate this limitation, we introduce a novel metric based on common neighborhood to evaluate the degree of sparsity in a dataset. A pile of matrices are derived for different levels of sparsity, and the matrices are permuted by a branch-and-bound algorithm. The matrix with the best block diagonal form is then selected by a compactness criterion. The selected matrix reveals the intrinsic structure of the data by excluding noisy data or outliers. This seriation algorithm is applicable even if the number of clusters is unknown or if the clusters are imbalanced. However, if the metric introduces too much sparsity in the data, the sub-sampled groups of data could be ousted. To resolve this problem, a multi-scale approach combining different levels of sparsity is proposed. The capability of the proposed seriation method is examined both by toy problems and in the context of spike sorting.
Fichier non déposé

Dates et versions

hal-01133002 , version 1 (18-03-2015)

Identifiants

Citer

Vincent Vigneron, Hsin Chen. A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting. Pattern Analysis and Applications, 2016, 19 (4), pp.885--903. ⟨10.1007/s10044-015-0458-2⟩. ⟨hal-01133002⟩
71 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More