scholarly journals Reducing the Complexity of Complex Gene Coexpression Networks by Coupling Multiweighted Labeling with Topological Analysis

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Alfredo Benso ◽  
Paolo Cornale ◽  
Stefano Di Carlo ◽  
Gianfranco Politano ◽  
Alessandro Savino

Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Kayla A. Johnson ◽  
Arjun Krishnan

Abstract Background Constructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks, including good choices for data pre-processing, normalization, and network transformation, have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing and normalization methods for RNA-seq focus on the end goal of determining differential gene expression. Results Here, we present a comprehensive benchmarking and analysis of 36 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We test these workflows on both large, homogenous datasets and small, heterogeneous datasets from various labs. We analyze the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with counts adjusted by size factors producing networks that most accurately recapitulate known tissue-naive and tissue-aware gene functional relationships. Conclusions Based on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/RNAseq_coexpression to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.


2021 ◽  
Author(s):  
Javier Pardo-Diaz ◽  
Philip Poole ◽  
Mariano Beguerisse-Diaz ◽  
Charlotte Deane ◽  
Gesine Reinert

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct \emph{weighted} gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to networks obtained from Pearson correlation. Moreover, we show that signed distance correlation networks capture more biological information than unweighted networks based on the same metric. While we use biological data sets to illustrate the method, the approach is general and can be used to construct networks in other domains.


2020 ◽  
Author(s):  
Kayla A Johnson ◽  
Arjun Krishnan

AbstractBackgroundConstructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks – including good choices for data pre-processing, normalization, and network transformation – have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing/normalization methods for RNA-seq focus on the end goal of determining differential gene expression.ResultsHere, we present a comprehensive benchmarking and analysis of 30 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We tested these workflows on both large, homogenous datasets (Genotype-Tissue Expression project) and small, heterogeneous datasets from various labs (submitted to the Sequence Read Archive). We analyzed the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with trimmed mean of M-values or upper quartile normalization producing networks that most accurately recapitulate known tissue-naive and tissue-specific gene functional relationships.ConclusionsBased on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/norm_for_RNAseq_coexp to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.


Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V. Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S. Poole ◽  
Charlotte M. Deane ◽  
...  

AbstractMotivationEven within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.ResultsWe introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson or Spearman correlations.Code availabilityhttps://github.com/javier-pardodiaz/sdcorGCN.


Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S Poole ◽  
Charlotte M Deane ◽  
...  

Abstract Motivation Even within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information. Results We introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation and mutual information. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson correlation or mutual information. Supplementary information Supplementary Information and code are available at Bioinformatics and https://github.com/javier-pardodiaz/sdcorGCN online.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pooja Singh ◽  
Ehsan Pashay Ahi ◽  
Christian Sturmbauer

Abstract Background The oral and pharyngeal jaw of cichlid fishes are a classic example of evolutionary modularity as their functional decoupling boosted trophic diversification and contributed to the success of cichlid adaptive radiations. Most studies until now have focused on the functional, morphological, or genetic aspects of cichlid jaw modularity. Here we extend this concept to include transcriptional modularity by sequencing whole transcriptomes of the two jaws and comparing their gene coexpression networks. Results We show that transcriptional decoupling of gene expression underlies the functional decoupling of cichlid oral and pharyngeal jaw apparatus and the two units are evolving independently in recently diverged cichlid species from Lake Tanganyika. Oral and pharyngeal jaw coexpression networks reflect the common origin of the jaw regulatory program as there is high preservation of gene coexpression modules between the two sets of jaws. However, there is substantial rewiring of genetic architecture within those modules. We define a global jaw coexpression network and highlight jaw-specific and species-specific modules within it. Furthermore, we annotate a comprehensive in silico gene regulatory network linking the Wnt and AHR signalling pathways to jaw morphogenesis and response to environmental cues, respectively. Components of these pathways are significantly differentially expressed between the oral and pharyngeal jaw apparatus. Conclusion This study describes the concerted expression of many genes in cichlid oral and pharyngeal jaw apparatus at the onset of the independent life of cichlid fishes. Our findings suggest that – on the basis of an ancestral gill arch network—transcriptional rewiring may have driven the modular evolution of the oral and pharyngeal jaws, highlighting the evolutionary significance of gene network reuse. The gene coexpression and in silico regulatory networks presented here are intended as resource for future studies on the genetics of vertebrate jaw morphogenesis and trophic adaptation.


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


Sign in / Sign up

Export Citation Format

Share Document