Efficient interpretable variants of online SOM for large dissimilarity data

Jérôme J. Mariette; Madalina Olteanu; Nathalie Vialaneix

doi:10.1016/j.neucom.2016.11.014

Article Dans Une Revue Neurocomputing Année : 2017

Efficient interpretable variants of online SOM for large dissimilarity data

(1) , (2) , (1)

1
2

Jérôme J. Mariette

Fonction : Auteur
PersonId : 14395
IdHAL : jerome-mariette
ORCID : 0000-0002-6161-4044
IdRef : 232567999

Unité de Mathématiques et Informatique Appliquées de Toulouse

Madalina Olteanu

Fonction : Auteur

Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne)

Nathalie Vialaneix

Fonction : Auteur
PersonId : 4221
IdHAL : nathalie-vialaneix
ORCID : 0000-0003-1156-0639
IdRef : 101680503

Unité de Mathématiques et Informatique Appliquées de Toulouse

Résumé

Self-organizing maps (SOM) are a useful tool for exploring data. In its original version, the SOM algorithm was designed for numerical vectors. Since then, several extensions have been proposed to handle complex datasets described by (dis)similarities. Most of these extensions represent prototypes by a list of (dis)similarities with the entire dataset and suffer from several drawbacks: their complexity is increased-it becomes quadratic instead of linear-, the stability is reduced and the interpretability of the prototypes is lost. In the present article, we propose and compare two extensions of the stochastic SOM for (dis)similarity data: the first one takes advantage of the online setting in order to maintain a sparse representation of the prototypes at each step of the algorithm, while the second one uses a dimension reduction in a feature space defined by the (dis)similarity. Our contributions to the analysis of (dis)similarity data with topographic maps are thus twofolds: first, we present a new version of the SOM algorithm which ensures a sparse representation of the prototypes through online updates. Second, this approach is compared on several benchmarks to a standard dimension reduction technique (K-PCA), which is itself adapted to large datasets with the Nyström approximation. Results demonstrate that both approaches lead to reduce the prototypes dimensionality while providing accurate results in a reasonable computational time. Selecting one of these two strategies depends on the dataset size, the need to easily interpret the results and the computational facilities available. The conclusion tries to provide some recommendations to help the user making this choice.

Mots clés

SOM Sparse methods Kernel dissimilarity K-PCA Nyström

Domaines

Applications [stat.AP]

Fichier principal

mariette_etal_N2016.pdf (1.07 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Nathalie Vialaneix : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01465340

Soumis le : samedi 11 février 2017-19:06:13

Dernière modification le : mardi 12 mars 2024-10:45:50

Archivage à long terme le : vendredi 12 mai 2017-12:50:17

Dates et versions

hal-01465340 , version 1 (11-02-2017)

Licence

Identifiants

HAL Id : hal-01465340 , version 1
DOI : 10.1016/j.neucom.2016.11.014
PRODINRA : 379556
WOS : 000392164400004

Citer

Jérôme J. Mariette, Madalina Olteanu, Nathalie Vialaneix. Efficient interpretable variants of online SOM for large dissimilarity data. Neurocomputing, 2017, 225, pp.31-48. ⟨10.1016/j.neucom.2016.11.014⟩. ⟨hal-01465340⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS1 INRA SAMOS SAMM INRAE INRAEOCCITANIETOULOUSE MATHNUM MIAT

100 Consultations

172 Téléchargements

Efficient interpretable variants of online SOM for large dissimilarity data

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager