scholarly journals Generating weighted and thresholded gene coexpression networks using signed distance correlation

2021 ◽  
Author(s):  
Javier Pardo-Diaz ◽  
Philip Poole ◽  
Mariano Beguerisse-Diaz ◽  
Charlotte Deane ◽  
Gesine Reinert

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct \emph{weighted} gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to networks obtained from Pearson correlation. Moreover, we show that signed distance correlation networks capture more biological information than unweighted networks based on the same metric. While we use biological data sets to illustrate the method, the approach is general and can be used to construct networks in other domains.

Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V. Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S. Poole ◽  
Charlotte M. Deane ◽  
...  

AbstractMotivationEven within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.ResultsWe introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson or Spearman correlations.Code availabilityhttps://github.com/javier-pardodiaz/sdcorGCN.


Author(s):  
Javier Pardo-Diaz ◽  
Lyuba V Bozhilova ◽  
Mariano Beguerisse-Díaz ◽  
Philip S Poole ◽  
Charlotte M Deane ◽  
...  

Abstract Motivation Even within well studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information. Results We introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation and mutual information. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson correlation or mutual information. Supplementary information Supplementary Information and code are available at Bioinformatics and https://github.com/javier-pardodiaz/sdcorGCN online.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Alfredo Benso ◽  
Paolo Cornale ◽  
Stefano Di Carlo ◽  
Gianfranco Politano ◽  
Alessandro Savino

Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pooja Singh ◽  
Ehsan Pashay Ahi ◽  
Christian Sturmbauer

Abstract Background The oral and pharyngeal jaw of cichlid fishes are a classic example of evolutionary modularity as their functional decoupling boosted trophic diversification and contributed to the success of cichlid adaptive radiations. Most studies until now have focused on the functional, morphological, or genetic aspects of cichlid jaw modularity. Here we extend this concept to include transcriptional modularity by sequencing whole transcriptomes of the two jaws and comparing their gene coexpression networks. Results We show that transcriptional decoupling of gene expression underlies the functional decoupling of cichlid oral and pharyngeal jaw apparatus and the two units are evolving independently in recently diverged cichlid species from Lake Tanganyika. Oral and pharyngeal jaw coexpression networks reflect the common origin of the jaw regulatory program as there is high preservation of gene coexpression modules between the two sets of jaws. However, there is substantial rewiring of genetic architecture within those modules. We define a global jaw coexpression network and highlight jaw-specific and species-specific modules within it. Furthermore, we annotate a comprehensive in silico gene regulatory network linking the Wnt and AHR signalling pathways to jaw morphogenesis and response to environmental cues, respectively. Components of these pathways are significantly differentially expressed between the oral and pharyngeal jaw apparatus. Conclusion This study describes the concerted expression of many genes in cichlid oral and pharyngeal jaw apparatus at the onset of the independent life of cichlid fishes. Our findings suggest that – on the basis of an ancestral gill arch network—transcriptional rewiring may have driven the modular evolution of the oral and pharyngeal jaws, highlighting the evolutionary significance of gene network reuse. The gene coexpression and in silico regulatory networks presented here are intended as resource for future studies on the genetics of vertebrate jaw morphogenesis and trophic adaptation.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.


2014 ◽  
Vol 11 (2) ◽  
pp. 68-79
Author(s):  
Matthias Klapperstück ◽  
Falk Schreiber

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.


Zootaxa ◽  
2021 ◽  
Vol 4963 (1) ◽  
pp. 58-90
Author(s):  
EDUARD JENDEK ◽  
OTO NAKLÁDAL

Two hundred and eighteen taxa of the genus Agrilus (Coleoptera, Buprestidae) mostly from the Palaearctic and Oriental regions are studied and their taxonomic, nomenclatural, distributional or biological data are updated. For some species, the biological information is supplemented by images from the field, or distributional data by maps showing the whole range of the species. The synonymy of following Agrilus is updated: A. blairi Bourgoin, 1925 (A. telawensis Fisher, 1935 syn. nov.); A. lancifer Deyrolle, 1864 (A. perakianus Kerremans, 1900 syn. nov.); A. lestagei Théry, 1930 (A. collartianus Descarpentries & Villiers, 1963 syn. nov.); A. lugubris Kerremans, 1914 (A. achilleus Obenberger, 1935 syn. nov.); A. ocularis Deyrolle, 1864 (A. capitatus Deyrolle, 1864 syn. nov.), A. bidentellus Obenberger, 1924 syn. nov.) and Agrilus velatus Kerremans, 1912 (A. kuchingensis Tôyama, 1987 syn. nov.). The invalid synonyms A. myrmido Kerremans, 1912; A. myrmidonius Obenberger, 1936 and A. miwai Théry, 1936 are moved from the synonymy of A. achilleus Obenberger, 1935 to synonymy of A. lugubris Kerremans, 1914. A. massanensis Schaefer, 1955 is downgraded to subspecies A. pratensis massanensis Schaefer, 1955 status nov. The specific name samyi Baudon, 1968 is resurrected from the synonymy of A. suturaalba Deyrolle, 1864 and revalidated as Agrilus samyi Baudon, 1968 (nomen revalidatum).


Sign in / Sign up

Export Citation Format

Share Document