Séminaire Probabilités et Statistiques
Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder
09
June 2022
logo_team
Intervenant : Stéphanie Allassonnière
Institution : Université de Paris, Polytechnique
Heure : 15h45 - 16h45
Lieu : 3L15

In this work, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder (VAE). Our approach combines the proposal of 1) a new VAE model, the latent space of which is modeled as a Riemannian manifold and which combines both Riemannian metric learning and normalizing flows and 2) a new generation scheme which produces more meaningful samples especially in the context of small data sets. The method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain magnetic resonance images (MRIs) are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art convolutional neural network classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.

All (past and future) events