Preventing dataset shift from breaking machine-learning biomarkers

Jérôme Dockès; Gaël Varoquaux; Jean-Baptiste Poline

doi:10.1093/gigascience/giab055

Article Dans Une Revue GigaScience Année : 2021

Preventing dataset shift from breaking machine-learning biomarkers

(1) , (2) , (1)

1
2

Jérôme Dockès

Fonction : Auteur
PersonId : 18975
IdHAL : jerome-dockes

McGill University = Université McGill [Montréal, Canada]

Gaël Varoquaux

Fonction : Auteur
PersonId : 5878
IdHAL : gael-varoquaux
ORCID : 0000-0003-1076-5122
IdRef : 126239894

Modelling brain structure, function and variability based on high-field MRI data

Jean-Baptiste Poline

Fonction : Auteur

McGill University = Université McGill [Montréal, Canada]

Résumé

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.

Mots clés

Dataset shift domain adaptation biomarkers predictive medicine precision medicine machine learning

Domaines

Apprentissage [cs.LG] Statistiques [math.ST] Bio-Informatique, Biologie Systémique [q-bio.QM]

Fichier principal

main.pdf (581.81 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jérôme Dockès : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03293375

Soumis le : mardi 20 juillet 2021-23:20:16

Dernière modification le : mercredi 3 avril 2024-10:20:13

Archivage à long terme le : jeudi 21 octobre 2021-19:06:44

Dates et versions

hal-03293375 , version 1 (20-07-2021)

Identifiants

HAL Id : hal-03293375 , version 1
ARXIV : 2107.09947
DOI : 10.1093/gigascience/giab055

Citer

Jérôme Dockès, Gaël Varoquaux, Jean-Baptiste Poline. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience, inPress, ⟨10.1093/gigascience/giab055⟩. ⟨hal-03293375⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA UNIV-RENNES1 INRIA IRISA INRIA2 CEA-UPSAY UR1-MATH-STIC UNIV-PARIS-SACLAY UR1-UFR-ISTIC JOLIOT CEA-DRF NEUROSPIN UNIV-RENNES ANR UR1-MATH-NUM GS-ENGINEERING GS-COMPUTER-SCIENCE GS-LIFE-SCIENCES-HEALTH

221 Consultations

328 Téléchargements

Preventing dataset shift from breaking machine-learning biomarkers

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager