Sparse canonical methods for biological data integration: application to a cross-platform study - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2009

Sparse canonical methods for biological data integration: application to a cross-platform study

Résumé

In the context of integration for systems biology, very few sparse approaches have been proposed so far to select variables in a canonical framework. In this study we propose a canonical mode of a new sparse PLS approach to handle two-block data sets, where the relationship bet\-ween the two types of variables is known to be symmetric. Sparse PLS has been proposed for either a regression or a canonical mode and includes a built-in procedure to perform variable selection while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. We compare the results obtained with two other sparse or related canonical approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical methods, which makes biological interpretation crucial to compare the different gene lists. We propose comprehensive graphical representations of both samples and variables to facilitate the biologist interpretation. We show that sPLS and CCA-EN select highly relevant genes, which enable a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. On the other hand, CIA tended to select redundant information. These canonical methods seem to be efficient tools to deal with variable selection in the context of high-throughput data integration.
Fichier principal
Vignette du fichier
AppliSPLS.pdf (2.74 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00323818 , version 1 (23-09-2008)

Identifiants

Citer

Kim-Anh Lê Cao, Pascal G.P. Martin, Christèle Robert-Granié, Philippe Besse. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics, 2009, 10 (january), pp.34. ⟨10.1186/1471-2105-10-34⟩. ⟨hal-00323818⟩
264 Consultations
213 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More