A multimodal variational autoencoder for estimating progression scores from imaging and microRNA data in rare neurodegenerative diseases

Frontotemporal dementia (FTD) is a rare neurodegenerative disease, often of genetic origin, with no effective treatment. There is a substantial pathophysiological overlap with amyotrophic lateral sclerosis (ALS), mutations in the C9orf72 gene being their most common genetic cause. In these disorders, no single biomarker can accurately measure progression, thus it is crucial to combine complementary information from multiple modalities to evaluate new therapeutic interventions. In particular, neuroimaging and transcriptomic (microRNA) data have been shown to have value to track FTD and ALS progression. As these conditions are rare, large samples are not available, hence the need for methods to fuse multimodal data from small samples. In this paper, we propose a method for computing a disease progression score (DPS) from cross-sectional multimodal data, based on variational autoencoders (VAE). We show that unsupervised training leads to the estimation of meaningful latent spaces, where subjects with similar disease states are clustered together and from which a DPS may be inferred. Models were evaluated on 14 patients, 40 presymptomatic mutation carriers and 37 healthy controls from the PREV-DEMALS study. Since there is no ground truth for the DPS, we used the inferred scores to perform pairwise classification as a proxy metric. Presymptomatic subjects and patients were classified with an average area under the ROC curve of 0.83 and 0.94, respectively without and with feature selection. The proposed approach has the potential to leverage cross-sectional multimodal datasets with small sample sizes in order to objectively measure disease progression.


INTRODUCTION
Frontotemporal dementia (FTD) is a rare heterogeneous neurodegenerative disease characterized by progressive behavioral changes, executive dysfunction and language impairments. 1 A large proportion of FTD cases are due to genetic mutations, the most frequent being an expansion in the C9orf72 gene. 2,3 C9orf72 expansions are also an important genetic cause of amyotrophic lateral sclerosis (ALS), a motor neuron disease leading to muscle atrophy, progressive weakness and eventual paralysis. 4 These fatal disorders, which may occasionally co-occur in C9orf72 -mutated individuals, have no effective treatment to date.
Presymptomatic carriers of the C9orf72 mutation, with no current clinical symptoms, are an ideal population for the evaluation of new disease-modifying treatments, before any irreversible brain damage has occurred. Previous work demonstrated the importance of neuroimaging 5 and transcriptomics (microRNA) 6 biomarkers to better understand the C9orf72 disease progression. However, when analysed independently, neuroimaging and microRNA data provide incomplete views of FTD and ALS. Therefore, in order to monitor the effect of experimental therapies, it is critical to leverage the complementary information provided by these modalities.
Since different biomarkers characterize a disease in different stages, several biomarkers could be combined to represent the entire progression with a single disease progression score (DPS). Many approaches have been developed for data-driven disease progression modeling, including event-based models (EBM), 7, 8 a vertex-wise model of brain pathology fitted with expectation-maximisation, 9 non-linear mixed-effects models, 10, 11 alternating least squares to fit sigmoid functions, 12 Gaussian processes, 13 Recurrent Neural Networks 14 and M-estimation. 15 Most of these approaches require a large amount of longitudinal data, which is not available for FTD/ALS. The only published methods that infer a disease progression score from cross-sectional data are event-based models, 7, 8 but these approaches do not scale well for hundreds of biomarkers, such as microRNA data.
In the present work, we proposed a method for inferring a disease progression score (a latent trait) based on a multimodal variational autoencoder (VAE). 16 VAEs are powerful generative models that project data in a regularized latent low dimensional space and have been shown to be effective in high dimensional low sample size settings. 17 These models have already been used with multimodal data, 18 although not with the goal of inferring a DPS. We hypothesized that the inferred score, based on cross-sectional neuroimaging and microRNA data, could represent the distance traveled along the underlying FTD/ALS pathophysiological pathway, and thus be used to monitor disease progression and evaluate novel treatments.

Studied population
Participants were recruited through the PREV-DEMALS study (https://clinicaltrials.gov, ID NCT02590276), a cohort focused on C9orf72 expansion carriers, comprising neuroimaging and microRNA sequencing data. MicroRNAs (miRNAs) are a class of noncoding RNAs that negatively regulate gene expression, 19 being detected in blood plasma and correlating with the progression of many neurodegenerative diseases, 20 including FTD and ALS.
Our study comprised 110 individuals, divided into three groups: 22 symptomatic carriers of a pathogenic expansion (patient group), 45 asymptomatic carriers (presymptomatic group) and 43 asymptomatic non-carriers (control group). Written informed consents were obtained from all participants and the study was approved by the ethics committee (Comité de Protection des Personnes CPP Ile-De-France VI, CPP 68-15 and ID RCB 2015-A00856-43).

Data acquisition and preprocessing
All individuals had transcriptomic data available, consisting of the expression levels of 589 miRNAs. However, only 91 (14 patients, 40 presymptomatic carriers and 37 controls) had also neuroimaging data available, consisting in grey matter volumes extracted from anatomical MRI (T1) including 68 cortical regions of interest (ROIs) (Desikan atlas) and 18 subcortical ROIs (Aseg atlas) as well as the estimated total intracranial volume, thus resulting in 87 imaging features. Details regarding features and population can be found in Ref. 6 and Ref. 5. Subjects were divided into two datasets: 19 subjects with only microRNA data, used as a discovery set for feature selection, and 91 subjects with multimodal neuroimaging and microRNA data, used as input to our models. Features were rescaled from 0 to 1 and ordered via principal component analysis in the transposed data matrix: we projected features into the first principal component and used the coordinate values to sort them.
We also conducted experiments with two simulated datasets, based on the real one. To build the simulated data matrices, we simply increased (or decreased) each feature value by 5% or 15% for all patients and healthy controls, to accentuate their means' difference. The presymptomatic participants remained unchanged.

Multimodal variational autoencoder
In order to build disease progression scores, we propose a multimodal variational autoencoder for estimating a latent space representation. Let x ∈ X represent a set of multimodal data, where each point is a vector with concatenated neuroimaging and microRNA data. A variational autoencoder (VAE) 16 is a generative model which aims to learn the training data distribution using a latent representation model: where z ∈ Z is a lower dimensional latent variable and p(z) is its prior distribution (commonly a multivariate unit Gaussian). VAEs learn two mappings in the form of neural networks: an encoder q φ (z|x) which maps data x to its latent representation z, and a decoder p θ (x|z) which maps from the latent representation z back to the input space. Since the marginal log-likelihood of the data is intractable, VAEs are trained to maximize the variational lower bound of the marginal log-likelihood, known as ELBO (Evidence Lower Bound): where D KL [q φ (z|x)||p(z)] is the Kullback-Leibler divergence between the approximated posterior q φ (z|x) and the prior distribution p(z) and acts as a regularization term.
Our encoder consisted of a 1-dimensional convolution layer, followed by two fully-connected layers, while the decoder was implemented with two fully-connected layers followed by a 1-dimensional transposed convolutional layer. After each layer, batch normalization 21 was applied for its regularization properties and to avoid vanishing or exploding gradients. The nonlinear activation function was the rectified linear unit (ReLU) f (x) = max(0, x) in all layers except the decoder's last one, which used a sigmoid function f (x) = 1 1+e −x in order to have the output normalized between 0 and 1. The loss function was optimized using Adam. 22 Two slightly different networks were used as our final models. For the experiments with no feature selection (589 miRNAs + 87 neuroimaging features), we identified that 64 channels and a kernel of dimension 80 with a stride of 10 was a good parametrization, along with a hidden layer of 400 units and a latent space of dimension 5. For the experiments with the discovery set and feature selection (68 miRNAs + 87 neuroimaging features), we chose 32 channels, a kernel of dimension 20 with a stride of 5, along with a hidden layer of 50 units and 2-dimensional latent space. The VAEs were implemented in Python 3.8.5 using PyTorch 1.8.1, and trained with batches of 32 subjects for 250 epochs using a learning rate of 10 −3 .

Computing disease progression scores in the latent space
We used a stratified 5-fold cross-validation strategy, training the VAE with four folds and testing with the remaining fold in each iteration. Training was unsupervised: no clinical labels were used. Our hypothesis was that the VAE would identify a meaningful latent space, placing subjects with the same clinical status (and similar disease stage among presymptomatic individuals and patients) closer together.
Once each model was trained, we projected the training data in the latent space and used the clinical labels (patient, presymptomatic subject or control) to compute the centroid of each group. We then defined the trajectory to traverse the latent space as the line passing through the centroids of the presymptomatic and the patient groups. Finally, we encoded the test fold in the latent space and computed the DPS for each subject as the coordinate of their projection in this line.
Since there is no ground truth for the DPS, we applied a proxy metric to assess model performance: the inferred scores were used to classify subjects according to their clinical status. Therefore, labels were used during test time to compute the area under the receiver operating characteristic curve (ROC AUC) averaged over the five folds.  Figure 1 depicts an example of a two-dimensional latent space obtained after training the VAE with four folds and using the trained model to encode the remaining test fold. In this particular fold, we observe a perfect separation between patients and the other two groups, and a clear (although not perfect) distinction between presymptomatic individuals and controls. Table 1 displays the mean and standard deviation of the area under the ROC curve obtained after a 5-fold cross-validation, for each pairwise comparison between clinical groups. The models were initially trained without any feature selection, with the expression levels of 589 miRNAs and the grey matter volumes of 87 ROIs. Then, we used 19 subjects as a discovery set to identify the most differentially expressed miRNAs between clinical groups, reducing the dimension of the microRNA data to 68. Classification performance improved when feature selection was applied. Table 1 also shows the results with the two simulated datasets. As expected, performance increases when a dataset with more discriminating features is used as input. Table 1. Area under the ROC curve (mean ± SD) for each pairwise classification, obtained using the inferred disease progression scores respectively without feature selection, with feature selection and with two simulated datasets.  Fig. 2 presents the visualization of the inferred scores for all 91 subjects after a 5-fold cross validation. The scores were computed for each subject when included in a test fold, without or with feature selection in the microRNA data. There is a superior performance (better separation between groups) when miRNAs are selected using the discovery set.

DISCUSSION
We proposed a multimodal variational autoencoder for combining imaging and transcriptomic (microRNA) data. It allowed inferring a single score to represent disease progression, using only cross-sectional neuroimaging and microRNA data from less than a hundred subjects. We showed that variational autoencoders built with shallow 1-dimensional convolutional neural networks were able to infer meaningful latent spaces, putting closer together subjects from the same clinical groups (patients, presymptomatic individuals and controls) without using any labels during training. We were able to encode individuals from the test sets into the latent spaces and compute their corresponding DPS. Then using only the computed scores, presymptomatic subjects and patients were distinguished with an average ROC AUC of 0.83 and 0.94, respectively without and with feature selection.
Our experiments with the simulated datasets showed that more informative features will lead to better results. In addition, the presented approach is generic enough to be used with datasets from other neurodegenerative diseases, even though our experiments focused only on C9orf72 -associated FTD and ALS. So our results motivate further experiments with other neurodegenerative diseases with well established biomarkers.
The current study has a limitation: the absence of ground truth for the progression scores, which led us to use classification performance as a proxy metric. Long-term longitudinal data would be needed to confirm the accuracy of the inferred DPS. For instance, we hypothesize that, for presymptomatic subjects, a higher DPS implies an earlier disease onset, and we would need long-term follow-up data to confirm this hypothesis. Future work could explore different network architectures (e.g. 2-dimensional inputs, different number of layers, feature maps and kernel sizes), investigate the integration of different prior information to order the input data and analyze different methods to traverse the latent space.
In summary, our results encourage the use of the proposed approach as a tool to measure disease progression in rare neurodegenerative diseases and evaluate potential treatments.