scholarly journals Robust gene coexpression networks using signed distance correlation

Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S Poole ◽  
Charlotte M Deane ◽  
...  

Abstract Motivation Even within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information. Results We introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation and mutual information. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson correlation or mutual information. Supplementary information Supplementary Information and code are available at Bioinformatics and https://github.com/javier-pardodiaz/sdcorGCN online.

Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V. Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S. Poole ◽  
Charlotte M. Deane ◽  
...  

AbstractMotivationEven within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.ResultsWe introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson or Spearman correlations.Code availabilityhttps://github.com/javier-pardodiaz/sdcorGCN.


2021 ◽  
Author(s):  
Javier Pardo-Diaz ◽  
Philip Poole ◽  
Mariano Beguerisse-Diaz ◽  
Charlotte Deane ◽  
Gesine Reinert

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct \emph{weighted} gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to networks obtained from Pearson correlation. Moreover, we show that signed distance correlation networks capture more biological information than unweighted networks based on the same metric. While we use biological data sets to illustrate the method, the approach is general and can be used to construct networks in other domains.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Alfredo Benso ◽  
Paolo Cornale ◽  
Stefano Di Carlo ◽  
Gianfranco Politano ◽  
Alessandro Savino

Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Luman Wang ◽  
Qiaochu Mo ◽  
Jianxin Wang

Most current gene coexpression databases support the analysis for linear correlation of gene pairs, but not nonlinear correlation of them, which hinders precisely evaluating the gene-gene coexpression strengths. Here, we report a new database, MIrExpress, which takes advantage of the information theory, as well as the Pearson linear correlation method, to measure the linear correlation, nonlinear correlation, and their hybrid of cell-specific gene coexpressions in immune cells. For a given gene pair or probe set pair input by web users, both mutual information (MI) and Pearson correlation coefficient (r) are calculated, and several corresponding values are reported to reflect their coexpression correlation nature, including MI andrvalues, their respective rank orderings, their rank comparison, and their hybrid correlation value. Furthermore, for a given gene, the top 10 most relevant genes to it are displayed with the MI,r, or their hybrid perspective, respectively. Currently, the database totally includes 16 human cell groups, involving 20,283 human genes. The expression data and the calculated correlation results from the database are interactively accessible on the web page and can be implemented for other related applications and researches.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pooja Singh ◽  
Ehsan Pashay Ahi ◽  
Christian Sturmbauer

Abstract Background The oral and pharyngeal jaw of cichlid fishes are a classic example of evolutionary modularity as their functional decoupling boosted trophic diversification and contributed to the success of cichlid adaptive radiations. Most studies until now have focused on the functional, morphological, or genetic aspects of cichlid jaw modularity. Here we extend this concept to include transcriptional modularity by sequencing whole transcriptomes of the two jaws and comparing their gene coexpression networks. Results We show that transcriptional decoupling of gene expression underlies the functional decoupling of cichlid oral and pharyngeal jaw apparatus and the two units are evolving independently in recently diverged cichlid species from Lake Tanganyika. Oral and pharyngeal jaw coexpression networks reflect the common origin of the jaw regulatory program as there is high preservation of gene coexpression modules between the two sets of jaws. However, there is substantial rewiring of genetic architecture within those modules. We define a global jaw coexpression network and highlight jaw-specific and species-specific modules within it. Furthermore, we annotate a comprehensive in silico gene regulatory network linking the Wnt and AHR signalling pathways to jaw morphogenesis and response to environmental cues, respectively. Components of these pathways are significantly differentially expressed between the oral and pharyngeal jaw apparatus. Conclusion This study describes the concerted expression of many genes in cichlid oral and pharyngeal jaw apparatus at the onset of the independent life of cichlid fishes. Our findings suggest that – on the basis of an ancestral gill arch network—transcriptional rewiring may have driven the modular evolution of the oral and pharyngeal jaws, highlighting the evolutionary significance of gene network reuse. The gene coexpression and in silico regulatory networks presented here are intended as resource for future studies on the genetics of vertebrate jaw morphogenesis and trophic adaptation.


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document