scholarly journals Using ontologies and semantic similarity measures for prioritization of gene regulatory networks

Author(s):  
Marianna Milano ◽  
Pietro Guzzi ◽  
Mario Cannataro

Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are sets of candidate genes potentially involved in different diseases. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical environments in which researchers are interested in the behavior of few molecules related to some specific disease while results may contains thousands of data and have very relevant dimensions. The filtering requires the use of domain-specific knowledge that is usually encoded into ontologies. Consequently, to filter out false positive genes, different approaches for selecting genes have been introduced. Such approaches are often referred to as Gene prioritization methods. They aim to identify the most related genes to a disease among a larger set of candidates genes, through the use of computational methods. We implemented GoD (Gene ranking based On Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed with respect to a disease among the annotations of each gene and those describing the selected disease.The current version of GoD enables the prioritization of a list of input genes for a selected disease. It uses HPO (Human Phenotype Ontology), GO (Gene Ontology), and DO (Disease Ontology) ontologies for the calculation of the ranking. It takes as input a list of genes or gene products annotated with GO Terms, HPO Terms, DO Terms and a selected disease described regarding annotation of GO, HPO or DO (user may also provide novel annotations). It produces as output the ranking of those genes with respect of the input disease. Package consists of three main functions: hpoGoD (for HPO based prioritization), goGoD (for GO based prioritization), and doGoD (for DO based prioritization). We tested GoD on Gene Regulatory Networks (GRNs). Biological network inference aims to reconstruct network of interactions (or associations) among biological genes starting from experimental observations. We selected three expression datasets: Dataset 1 (GDS3285) , related to breast cancer disease; Dataset 2 (GDS5072), related to prostate cancer disease; and Dataset 3 (GDS5093), related to Dengue virus (DENV) infection. Initially, experimental data are given as input to five GRN inference algorithms, i.e. ARACNE, CLR, MRNET, GENIE3 and GGM, to produce 5 inferred GRN networks. For each inferred GRN, GoD receives as input the list of top genes and produces for each gene a semantic similarity value on a selected disease considering one of the previous ontologies (e.g. Disease Ontology). For each GRN, the genes are ranked and reordered on the basis of the computed semantic similarity and are compared allowing to rank each GRN inference method with respect to the initially selected disease.

Author(s):  
Marianna Milano ◽  
Pietro Guzzi ◽  
Mario Cannataro

Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are sets of candidate genes potentially involved in different diseases. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical environments in which researchers are interested in the behavior of few molecules related to some specific disease while results may contains thousands of data and have very relevant dimensions. The filtering requires the use of domain-specific knowledge that is usually encoded into ontologies. Consequently, to filter out false positive genes, different approaches for selecting genes have been introduced. Such approaches are often referred to as Gene prioritization methods. They aim to identify the most related genes to a disease among a larger set of candidates genes, through the use of computational methods. We implemented GoD (Gene ranking based On Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed with respect to a disease among the annotations of each gene and those describing the selected disease.The current version of GoD enables the prioritization of a list of input genes for a selected disease. It uses HPO (Human Phenotype Ontology), GO (Gene Ontology), and DO (Disease Ontology) ontologies for the calculation of the ranking. It takes as input a list of genes or gene products annotated with GO Terms, HPO Terms, DO Terms and a selected disease described regarding annotation of GO, HPO or DO (user may also provide novel annotations). It produces as output the ranking of those genes with respect of the input disease. Package consists of three main functions: hpoGoD (for HPO based prioritization), goGoD (for GO based prioritization), and doGoD (for DO based prioritization). We tested GoD on Gene Regulatory Networks (GRNs). Biological network inference aims to reconstruct network of interactions (or associations) among biological genes starting from experimental observations. We selected three expression datasets: Dataset 1 (GDS3285) , related to breast cancer disease; Dataset 2 (GDS5072), related to prostate cancer disease; and Dataset 3 (GDS5093), related to Dengue virus (DENV) infection. Initially, experimental data are given as input to five GRN inference algorithms, i.e. ARACNE, CLR, MRNET, GENIE3 and GGM, to produce 5 inferred GRN networks. For each inferred GRN, GoD receives as input the list of top genes and produces for each gene a semantic similarity value on a selected disease considering one of the previous ontologies (e.g. Disease Ontology). For each GRN, the genes are ranked and reordered on the basis of the computed semantic similarity and are compared allowing to rank each GRN inference method with respect to the initially selected disease.


2020 ◽  
Vol 21 (11) ◽  
pp. 1054-1059
Author(s):  
Bin Yang ◽  
Yuehui Chen

: Reconstruction of gene regulatory networks (GRN) plays an important role in understanding the complexity, functionality and pathways of biological systems, which could support the design of new drugs for diseases. Because differential equation models are flexible androbust, these models have been utilized to identify biochemical reactions and gene regulatory networks. This paper investigates the differential equation models for reverse engineering gene regulatory networks. We introduce three kinds of differential equation models, including ordinary differential equation (ODE), time-delayed differential equation (TDDE) and stochastic differential equation (SDE). ODE models include linear ODE, nonlinear ODE and S-system model. We also discuss the evolutionary algorithms, which are utilized to search the optimal structures and parameters of differential equation models. This investigation could provide a comprehensive understanding of differential equation models, and lead to the discovery of novel differential equation models.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 61
Author(s):  
Kuan Liu ◽  
Haiyuan Liu ◽  
Dongyan Sun ◽  
Lei Zhang

The reconstruction of gene regulatory networks based on gene expression data can effectively uncover regulatory relationships between genes and provide a deeper understanding of biological control processes. Non-linear dependence is a common problem in the regulatory mechanisms of gene regulatory networks. Various methods based on information theory have been developed to infer networks. However, the methods have introduced many redundant regulatory relationships in the network inference process. A recent measurement method called distance correlation has, in many cases, shown strong and computationally efficient non-linear correlations. In this paper, we propose a novel regulatory network inference method called the distance-correlation and network topology centrality network (DCNTC) method. The method is based on and extends the Local Density Measurement of Network Node Centrality (LDCNET) algorithm, which has the same choice of network centrality ranking as the LDCNET algorithm, but uses a simpler and more efficient distance correlation measure of association between genes. In this work, we integrate distance correlation and network topological centrality into the reasoning about the structure of gene regulatory networks. We will select optimal thresholds based on the characteristics of the distribution of each gene pair in relation to distance correlation. Experiments were carried out on four network datasets and their performance was compared.


2021 ◽  
Vol 11 ◽  
Author(s):  
James T. Lim ◽  
Chen Chen ◽  
Adam D. Grant ◽  
Megha Padi

The use of biological networks such as protein–protein interaction and transcriptional regulatory networks is becoming an integral part of genomics research. However, these networks are not static, and during phenotypic transitions like disease onset, they can acquire new “communities” (or highly interacting groups) of genes that carry out cellular processes. Disease communities can be detected by maximizing a modularity-based score, but since biological systems and network inference algorithms are inherently noisy, it remains a challenge to determine whether these changes represent real cellular responses or whether they appeared by random chance. Here, we introduce Constrained Random Alteration of Network Edges (CRANE), a method for randomizing networks with fixed node strengths. CRANE can be used to generate a null distribution of gene regulatory networks that can in turn be used to rank the most significant changes in candidate disease communities. Compared to other approaches, such as consensus clustering or commonly used generative models, CRANE emulates biologically realistic networks and recovers simulated disease modules with higher accuracy. When applied to breast and ovarian cancer networks, CRANE improves the identification of cancer-relevant GO terms while reducing the signal from non-specific housekeeping processes.


Author(s):  
Gourab Ghosh Roy ◽  
Nicholas Geard ◽  
Karin Verspoor ◽  
Shan He

Abstract Motivation Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. Results To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. Availability and implementation Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Ayoub Lasri ◽  
Vahid Shahrezaei ◽  
Marc Sturrock

Single cell RNA-sequencing (scRNA-seq) has very rapidly become the new workhorse of modern biology providing an unprecedented global view on cellular diversity and heterogeneity. In particular, the structure of gene-gene expression correlation contains information on the underlying gene regulatory networks. However, interpretation of scRNA-seq data is challenging due to specific experimental error and biases that are unique to this kind of data including drop-out (or technical zeros). To deal with this problem several methods for imputation of zeros for scRNA-seq have been developed. However, it is not clear how these processing steps affect inference of genetic networks from single cell data. Here, we introduce Biomodelling.jl, a tool for generation of synthetic scRNA-seq data using multiscale modelling of stochastic gene regulatory networks in growing and dividing cells. Our tool produces realistic transcription data with a known ground truth network topology that can be used to benchmark different approaches for gene regulatory network inference. Using this tool we investigate the impact of different imputation methods on the performance of several network inference algorithms. Biomodelling.jl provides a versatile and useful tool for future development and benchmarking of network inference approaches using scRNA-seq data.


2019 ◽  
Author(s):  
Viral Panchal ◽  
Daniel Linder

AbstractInferring gene regulatory networks from high-throughput ‘omics’ data has proven to be a computationally demanding task of critical importance. Frequently the classical methods breakdown due to the curse of dimensionality, and popular strategies to overcome this are typically based on regularized versions of the classical methods. However, these approaches rely on loss functions that may not be robust and usually do not allow for the incorporation of prior information in a straightforward way. Fully Bayesian methods are equipped to handle both of these shortcomings quite naturally, and they offer potential for improvements in network structure learning. We propose a Bayesian hierarchical model to reconstruct gene regulatory networks from time series gene expression data, such as those common in perturbation experiments of biological systems. The proposed methodology utilizes global-local shrinkage priors for posterior selection of regulatory edges and relaxes the common normal likelihood assumption in order to allow for heavy-tailed data, which was shown in several of the cited references to severely impact network inference. We provide a sufficient condition for posterior propriety and derive an efficient MCMC via Gibbs sampling in the Appendix. We describe a novel way to detect multiple scales based on the corresponding posterior quantities. Finally, we demonstrate the performance of our approach in a simulation study and compare it with existing methods on real data from a T-cell activation study.


2019 ◽  
Author(s):  
Shuchi Smita ◽  
Jason Kiehne ◽  
Sajag Adhikari ◽  
Erliang Zeng ◽  
Qin Ma ◽  
...  

AbstractLegume plants such as soybean produce two major types of root lateral organs, lateral roots and root nodules. A robust computational framework was developed to predict potential gene regulatory networks (GRNs) associated with root lateral organ development in soybean. A genome-scale expression dataset was obtained from soybean root nodules and lateral roots and subjected to biclustering using QUBIC. Biclusters (BCs) and transcription factor (TF) genes with enriched expression in lateral root tissues were converged using different network inference algorithms to predict high confident regulatory modules that are repeatedly retrieved in different methods. The ranked combination of results from all different network inference algorithms into one ensemble solution identified 21 GRN modules of 182 co-regulated genes networks potentially involved in root lateral organ development stages in soybean. The pipeline correctly predicted previously known nodule- and LR-associated TFs including the expected hierarchical relationships. The results revealed high scorer AP2, GRF5, and C3H co-regulated GRN modules during early nodule development; and GRAS, LBD41, and ARR18 co-regulated GRN modules late during nodule maturation. Knowledge from this work supported by experimental validation in the future is expected to help determine key gene targets for biotechnological strategies to optimize nodule formation and enhance nitrogen fixation.


2017 ◽  
Vol 59 (4) ◽  
pp. 1237-1254 ◽  
Author(s):  
Shweta Bagewadi Kawalia ◽  
Tamara Raschka ◽  
Mufassra Naz ◽  
Ricardo de Matos Simoes ◽  
Philipp Senger ◽  
...  

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Konstantinos Pliakos ◽  
Celine Vens

Abstract Background Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). Results We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. Conclusions Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.


Sign in / Sign up

Export Citation Format

Share Document