Controlling a confound in predictive models with a test set minimizing its effect

Darya Chyzhyk; Gaël Varoquaux; Bertrand Thirion; Michael Milham

Communication Dans Un Congrès Année : 2018

Controlling a confound in predictive models with a test set minimizing its effect

(1) , (1) , (1) , (2)

1
2

Darya Chyzhyk

Fonction : Auteur

Modelling brain structure, function and variability based on high-field MRI data

Gaël Varoquaux

Fonction : Auteur
PersonId : 5878
IdHAL : gael-varoquaux
ORCID : 0000-0003-1076-5122
IdRef : 126239894

Modelling brain structure, function and variability based on high-field MRI data

Bertrand Thirion

Fonction : Auteur
PersonId : 833469

Modelling brain structure, function and variability based on high-field MRI data

Michael Milham

Fonction : Auteur

Nathan S. Kline Institute for Psychiatric Research

Résumé

Predictive models applied on brain images can extract imaging biomarkers of pathologies or psychological traits. Yet, a successful prediction may be driven by a confounding effect that is correlated with the effect of interest. For instance fluid intelligence is strongly impacted by age; age is well predicted from brain images; hence successful prediction of fluid intelligence from brain images might have captured nothing more than a biomarker of aging. Here we introduce a non-parametric approach to control for a confounding effect in a predictive model. It is based on crafting a test set on which the effect of interest is independent from the confounding effect. We name this strategy " anti mutual-information subsampling ". We demonstrate the approach with a large sample resting-state fMRI and psychometric data of healthy aging subjects (n = 608). We show that using a linear model to remove the effect of age on the brain signals (" deconfounding ") leads to pessimistic scores, as previously reported. Anti mutual-information subsampling does not require to remove from the brain signals the shared variance between aging and fluid intelligence, and hence does not display this pessimistic behavior. In addition, it is non-parametric and hence robust to violations of the linear hypothesis.

Mots clés

biomarkers predictive models phenotype confound statistical testing subsampling

Domaines

Statistiques [math.ST] Machine Learning [stat.ML] Imagerie médicale

Fichier principal

Chyzhyk-PRNI-2018-hal.pdf (508.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Darya Chyzhyk : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01831701

Soumis le : vendredi 6 juillet 2018-13:12:44

Dernière modification le : mercredi 3 avril 2024-10:20:13

Archivage à long terme le : mardi 2 octobre 2018-03:20:31

Dates et versions

hal-01831701 , version 1 (06-07-2018)

Identifiants

HAL Id : hal-01831701 , version 1

Citer

Darya Chyzhyk, Gaël Varoquaux, Bertrand Thirion, Michael Milham. Controlling a confound in predictive models with a test set minimizing its effect. PRNI 2018 - 8th International Workshop on Pattern Recognition in Neuroimaging, Jun 2018, Singapore, Singapore. pp.1-4. ⟨hal-01831701⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA INRIA INRIA2 CEA-UPSAY UNIV-PARIS-SACLAY JOLIOT CEA-DRF NEUROSPIN GS-ENGINEERING GS-COMPUTER-SCIENCE

366 Consultations

887 Téléchargements

Controlling a confound in predictive models with a test set minimizing its effect

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager