Efficient interpretable variants of online SOM for large dissimilarity data

Abstract : Self-organizing maps (SOM) are a useful tool for exploring data. In its original version, the SOM algorithm was designed for numerical vectors. Since then, several extensions have been proposed to handle complex datasets described by (dis)similarities. Most of these extensions represent prototypes by a list of (dis)similarities with the entire dataset and suffer from several drawbacks: their complexity is increased-it becomes quadratic instead of linear-, the stability is reduced and the interpretability of the prototypes is lost. In the present article, we propose and compare two extensions of the stochastic SOM for (dis)similarity data: the first one takes advantage of the online setting in order to maintain a sparse representation of the prototypes at each step of the algorithm, while the second one uses a dimension reduction in a feature space defined by the (dis)similarity. Our contributions to the analysis of (dis)similarity data with topographic maps are thus twofolds: first, we present a new version of the SOM algorithm which ensures a sparse representation of the prototypes through online updates. Second, this approach is compared on several benchmarks to a standard dimension reduction technique (K-PCA), which is itself adapted to large datasets with the Nyström approximation. Results demonstrate that both approaches lead to reduce the prototypes dimensionality while providing accurate results in a reasonable computational time. Selecting one of these two strategies depends on the dataset size, the need to easily interpret the results and the computational facilities available. The conclusion tries to provide some recommendations to help the user making this choice.
Type de document :
Article dans une revue
Neurocomputing, Elsevier, 2017, 225, pp.31-48. 〈10.1016/j.neucom.2016.11.014〉
Liste complète des métadonnées

Littérature citée [56 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01465340
Contributeur : Nathalie Villa-Vialaneix <>
Soumis le : samedi 11 février 2017 - 19:06:13
Dernière modification le : jeudi 16 février 2017 - 01:07:23
Document(s) archivé(s) le : vendredi 12 mai 2017 - 12:50:17

Fichiers

mariette_etal_N2016.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Jérôme Mariette, Madalina Olteanu, Nathalie Villa-Vialaneix. Efficient interpretable variants of online SOM for large dissimilarity data. Neurocomputing, Elsevier, 2017, 225, pp.31-48. 〈10.1016/j.neucom.2016.11.014〉. 〈hal-01465340〉

Partager

Métriques

Consultations de la notice

58

Téléchargements de fichiers

53