ADESIT: Visualize the Limits of your Data in a Machine Learning Process

Pierre Faure-Giovagnoli; Marie Le Guilly; Jean-Marc Petit; Vasile-Marian Scuturici

doi:10.14778/3476311.3476318

Communication Dans Un Congrès Année : 2021

ADESIT: Visualize the Limits of your Data in a Machine Learning Process

ADESIT : Visualisez les Limites de vos Données pour l'Apprentissage Supervisé

(1, 2) , (2) , (2) , (2)

1
2

Pierre Faure-Giovagnoli

Fonction : Auteur
PersonId : 737704
IdHAL : pierre-faure-giovagnoli
ORCID : 0000-0003-1739-9444

Compagnie Nationale du Rhône

Base de Données

Marie Le Guilly

Fonction : Auteur
PersonId : 17225
IdHAL : marie-le-guilly

Base de Données

Jean-Marc Petit

Fonction : Auteur
PersonId : 4224
IdHAL : jean-marc-petit
ORCID : 0000-0002-0015-745X

Base de Données

Vasile-Marian Scuturici

Fonction : Auteur
PersonId : 3040
IdHAL : vasile-marian-scuturici
ORCID : 0000-0001-8139-0212
IdRef : 059677465

Base de Données

Résumé

Thanks to the numerous machine learning tools available to us nowadays, it is easier than ever to derive a model from a dataset in the frame of a supervised learning problem. However, when this model behaves poorly compared with an expected performance, the underlying question of the existence of such a model is often underlooked and one might just be tempted to try different parameters or just choose another model architecture. This is why the quality of the learning examples should be considered as early as possible as it acts as a go/no go signal for the following potentially costly learning process. With ADESIT, we provide a way to evaluate the ability of a dataset to perform well for a given supervised learning problem through statistics and visual exploration. Notably, we base our work on recent studies proposing the use of functional dependencies and specifically counterexample analysis to provide dataset cleanliness statistics but also a theoretical upper bound on the prediction accuracy directly linked to the problem settings (measurement uncertainty, expected generalization...). In brief, ADESIT is intended as a go/no go step right after data selection and right before the machine learning process itself. With further analysis for a given problem, the user can characterize, clean and export dynamically selected subsets, allowing to better understand what regions of the data could be refined and where the data precision must be improved by using, for example, new or more precise sensors.

Domaines

Base de données [cs.DB] Machine Learning [stat.ML] Interface homme-machine [cs.HC]

Fichier principal

ADESIT_camReady_round2.pdf (615.51 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Pierre Faure--Giovagnoli : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03242380

Soumis le : jeudi 2 septembre 2021-10:06:06

Dernière modification le : mercredi 5 juillet 2023-15:28:04

Dates et versions

hal-03242380 , version 1 (31-05-2021)

hal-03242380 , version 2 (02-09-2021)

Identifiants

HAL Id : hal-03242380 , version 2
DOI : 10.14778/3476311.3476318

Citer

Pierre Faure-Giovagnoli, Marie Le Guilly, Jean-Marc Petit, Vasile-Marian Scuturici. ADESIT: Visualize the Limits of your Data in a Machine Learning Process. International Conference on Very Large Data Bases, Aug 2021, Copenhague, Denmark. ⟨10.14778/3476311.3476318⟩. ⟨hal-03242380v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON LIRIS INSA-GROUPE UDL

193 Consultations

152 Téléchargements

ADESIT: Visualize the Limits of your Data in a Machine Learning Process

ADESIT : Visualisez les Limites de vos Données pour l'Apprentissage Supervisé

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager