Parameter-less co-clustering for star-structured heterogeneous data

Dino Ienco 1, 2 Céline Robardet 3 Ruggero Pensa 4 Rosa Meo 4
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 COMBINING - COMputational BIology and data miNING
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes
Abstract : The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman–Kruskal’s τ, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend τ to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes τ by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.
Type de document :
Article dans une revue
Data Mining and Knowledge Discovery, Springer Verlag, 2013, 26 (2), pp.217-254. 〈10.1007/s10618-012-0248-z〉
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00794744
Contributeur : Import Ws Irstea <>
Soumis le : mardi 26 février 2013 - 14:40:35
Dernière modification le : jeudi 24 mai 2018 - 15:59:25
Document(s) archivé(s) le : dimanche 2 avril 2017 - 05:20:13

Fichier

mt2012-pub00036393.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Dino Ienco, Céline Robardet, Ruggero Pensa, Rosa Meo. Parameter-less co-clustering for star-structured heterogeneous data. Data Mining and Knowledge Discovery, Springer Verlag, 2013, 26 (2), pp.217-254. 〈10.1007/s10618-012-0248-z〉. 〈hal-00794744〉

Partager

Métriques

Consultations de la notice

717

Téléchargements de fichiers

298