Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Scientific Reports Année : 2018

Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks

Marc Clastre
Olivier Pichon

Résumé

Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation. Global networks can be analyzed at the transcriptome-wide scale or after querying them with a set of guide genes to capture the transcriptional landscape of a given pathway in a process named Pathway Level Coexpression (PLC). A critical step in network construction remains the definition of gene co-expression. In the present work, we compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks and PLCs. This evaluation was conducted on the model plant Arabidopsis thaliana using microarray and differently pre-processed RNA-seq datasets. We particularly evaluated how dataset × distance measurement combinations performed in 5 PLCs corresponding to 4 well described plant metabolic pathways (phenylpropanoid, carbohydrate, fatty acid and terpene metabolisms) and the cytokinin signaling pathway. Our present work highlights how PCC ranked with HRR is better suited for global network construction and PLC with microarray and RNA-seq data than other distance methods, especially to cluster genes in partitions similar to biological subpathways. Constructing global gene co-expression networks is a popular approach to highlight transcriptional relationships (edges) between genes (vertices). The 'Guilt-by-Association' (GBA) principle supposes that genes sharing similar functions are preferentially connected and aims at predicting new functions for proteins by determining how their respective encoding genes are co-expressed with others using a reference dataset containing known gene functions such as the Gene Ontology (GO) 1. Defining edges connecting genes remains a critical step in global co-expression network construction. Expression data (microarray or RNA-seq) are used to construct expression matrices (genes × samples) and to calculate a distance or a similarity for each possible gene pair. The resulting pairwise distance matrix is then thresholded to obtain an adjacency matrix that discriminates relevant edges. Only edges with a distance below (or a similarity above) the set threshold are considered significant and retained for network construction. The procedure is expected to remove non biologically relevant gene associations while retaining the relevant ones and can be assessed with any reference dataset. Alternatively, guide gene sets may be used to extract more human-readable information from large networks in a process named Pathway-Level Coexpression (PLC) 2–7. This approach aims at capturing the best transcriptional associations of a gene set and at highlighting functional gene groups such as known subpathways in this set. There are two types of approaches to determine transcriptional associations of genes: those that are supervised and those that are unsupervised. Supervised approaches such as regression and machine learning based methods require a prior knowledge which is used as a training dataset to recover biologically relevant gene associations and are used to infer regulatory networks, i.e. to uncover preferential and sequential interactions of a gene over the others. The superiority of
Fichier principal
Vignette du fichier
2018 Liesecke.pdf (16.18 Mo) Télécharger le fichier
Origine : Publication financée par une institution
Loading...

Dates et versions

hal-01865148 , version 1 (31-08-2018)

Identifiants

Citer

Franziska Liesecke, Dimitri Daudu, Rodolphe Dugé de Bernonville, Sébastien Besseau, Marc Clastre, et al.. Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Scientific Reports, 2018, 8 (1), pp.10885 - 10885. ⟨10.1038/s41598-018-29077-3⟩. ⟨hal-01865148⟩

Collections

UNIV-TOURS BBV
150 Consultations
29 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More