Sparse canonical methods for biological data integration: application to a cross-platform study

In the context of integration for systems biology, very few sparse approaches have been proposed so far to select variables in a canonical framework. In this study we propose a canonical mode of a new sparse PLS approach to handle two-block data sets, where the relationship bet\-ween the two types of variables is known to be symmetric. Sparse PLS has been proposed for either a regression or a canonical mode and includes a built-in procedure to perform variable selection while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. We compare the results obtained with two other sparse or related canonical approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical methods, which makes biological interpretation crucial to compare the different gene lists. We propose comprehensive graphical representations of both samples and variables to facilitate the biologist interpretation. We show that sPLS and CCA-EN select highly relevant genes, which enable a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. On the other hand, CIA tended to select redundant information. These canonical methods seem to be efficient tools to deal with variable selection in the context of high-throughput data integration.

Mots clés

gene expression gene network co-inertia analysis metabolite

cancer

Domaines

Génomique, Transcriptomique et Protéomique [q-bio.GN]

Fichier principal

AppliSPLS.pdf (2.74 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Kim-Anh Lê Cao : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00323818

Soumis le : mardi 23 septembre 2008-11:54:39

Dernière modification le : lundi 20 novembre 2023-11:44:19

Archivage à long terme le : lundi 8 octobre 2012-13:26:17

Dates et versions

hal-00323818 , version 1 (23-09-2008)

Identifiants

HAL Id : hal-00323818 , version 1
DOI : 10.1186/1471-2105-10-34
PRODINRA : 5203
WOS : 000264006900001

Citer

Kim-Anh Lê Cao, Pascal G.P. Martin, Christèle Robert-Granié, Philippe Besse. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics, 2009, 10 (january), pp.34. ⟨10.1186/1471-2105-10-34⟩. ⟨hal-00323818⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INSA-TOULOUSE INRA IMT UT1-CAPITOLE INSA-GROUPE INRAE GENETIQUE_ANIMALE GENPHYSE INRAEOCCITANIETOULOUSE UNIV-UT3 UT3-TOULOUSEINP

264 Consultations

213 Téléchargements