Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.

Mots clés

Data Augmentation Variational Autoencoder Latent space representation

Domaines

Machine Learning [stat.ML]

Fichier principal

Data_Augmentation_HDLSS_VAE.pdf (6.12 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Clément Chadebec : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03214093

Soumis le : vendredi 30 avril 2021-18:43:15

Dernière modification le : jeudi 14 décembre 2023-13:50:38

Archivage à long terme le : samedi 31 juillet 2021-19:29:50

Dates et versions

hal-03214093 , version 1 (30-04-2021)

Identifiants

HAL Id : hal-03214093 , version 1
ARXIV : 2105.00026
DOI : 10.1109/TPAMI.2022.3185773

Citer

Clément Chadebec, Elina Thibeau-Sutre, Ninon Burgos, Stéphanie Allassonnière. Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, ⟨10.1109/TPAMI.2022.3185773⟩. ⟨hal-03214093⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM EPHE CNRS INRIA CORDELIERS ICM INRIA2 GENCI ARAMISLAB PSL SORBONNE-UNIVERSITE SU-MEDECINE SU-SCIENCES UP-SANTE ANR PRAIRIE-IA

327 Consultations

125 Téléchargements