A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia

Abstract : The growing number of modalities (e.g. multi-omics, imaging and clinical data) characterizing a given disease provides physicians and statisticians with complementary facets reflecting the disease process but emphasizes the need for novel statistical methods of data analysis able to unify these views. Such data sets are indeed intrinsically structured in blocks, where each block represents a set of variables observed on a group of individuals. Therefore, classical statistical tools cannot be applied without altering their organization, with the risk of information loss. Regularized generalized canonical correlation analysis (RGCCA) and its sparse generalized canonical correlation analysis (SGCCA) counterpart are component-based methods for exploratory analyses of data sets structured in blocks of variables. Rather than operating sequentially on parts of the measurements, the RGCCA/SGCCA-based integrative analysis method aims at summarizing the relevant information between and within the blocks. It processes a priori information defining which blocks are supposed to be linked to one another, thus reflecting hypotheses about the biology underlying the data blocks. It also requires the setting of extra parameters that need to be carefully adjusted. Here, we provide practical guidelines for the use of RGCCA/SGCCA. We also illustrate the flexibility and usefulness of RGCCA/SGCCA on a unique cohort of patients with four genetic subtypes of spinocerebellar ataxia, in which we obtained multiple data sets from brain volumetry and magnetic resonance spectroscopy, and metabolomic and lipidomic analyses. As a first step toward the extraction of multimodal biomarkers, and through the reduction to a few meaningful components and the visualization of relevant variables, we identified possible markers of disease progression.
Document type :
Journal articles
Complete list of metadatas

Cited literature [63 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01630727
Contributor : Arthur Tenenhaus <>
Submitted on : Wednesday, November 8, 2017 - 11:41:14 AM
Last modification on : Wednesday, July 10, 2019 - 7:18:03 PM

Identifiers

Citation

Imene Garali, Isaac M. Adanyeguh, Farid Ichou, Vincent Perlbarg, Alexandre Seyer, et al.. A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia. Briefings in Bioinformatics, Oxford University Press (OUP), 2017, ⟨10.1093/bib/bbx060⟩. ⟨hal-01630727⟩

Share

Metrics

Record views

581