Selecting Representative Instances from Datasets

Hamid Mirisaee 1 Ahlame Chouakria Douzal 1 Alexandre Termier 2
2 DREAM - Diagnosing, Recommending Actions and Modelling
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : We propose in this paper a new, alternative approach for the problem of finding a set of representative objects in large datasets. To do so, we first formulate the general Instance Selection Problem (ISP) and then study three variants of that in order to select instances from dierent regions of the data.These variants aim at finding the objects located in three very different locations of the data: the inner frontier, the central area and the outer frontier. Solutions to these problems have been discussed and their complexities have been studied.To illustrate the effectiveness of the proposed techniques, we first use a small, synthetic dataset for visualization purpose. We then study them on the Reuters dataset and show that the integration of instances selected by the ISP techniques is able to provide a good representation of the data and can be considered as a complementary approach for the state-of-the-art methods. Finally, we examine the quality of the selected objects by applying a topic-based analysis in order to show how well the selected documents cover the topics in the Reuters dataset.
Type de document :
Communication dans un congrès
IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015), Oct 2015, Paris, France. 2015
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01191805
Contributeur : Alexandre Termier <>
Soumis le : mercredi 2 septembre 2015 - 15:38:29
Dernière modification le : mercredi 29 novembre 2017 - 15:41:48

Identifiants

  • HAL Id : hal-01191805, version 1

Citation

Hamid Mirisaee, Ahlame Chouakria Douzal, Alexandre Termier. Selecting Representative Instances from Datasets. IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015), Oct 2015, Paris, France. 2015. 〈hal-01191805〉

Partager

Métriques

Consultations de la notice

318