Exploring the Operational Characteristics of Inference Algorithms for Transcriptional Networks by Means of Synthetic Data

2008 ◽  
Vol 14 (1) ◽  
pp. 49-63 ◽  
Author(s):  
Koenraad Van Leemput ◽  
Tim Van den Bulcke ◽  
Thomas Dhollander ◽  
Bart De Moor ◽  
Kathleen Marchal ◽  
...  

The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network. The tested expression data parameters are network size, network topology, type and degree of noise, quantity of expression data, and interaction types between genes. This is done by applying three well-known inference algorithms to SynTReN data sets. The results show the power of synthetic data in revealing operational characteristics of inference algorithms that are unlikely to be discovered by means of biological microarray data only.

2014 ◽  
Vol 11 (2) ◽  
pp. 68-79
Author(s):  
Matthias Klapperstück ◽  
Falk Schreiber

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.


2017 ◽  
Author(s):  
Duygu Dikicioglu ◽  
Daniel J H Nightingale ◽  
Valerie Wood ◽  
Kathryn S Lilley ◽  
Stephen G Oliver

AbstractThe topological analyses of many large-scale molecular interaction networks often provide only limited insights into network function or evolution. In this paper, we argue that the functional heterogeneity of network components, rather than network size, is the main factor limiting the utility of topological analysis of large cellular networks. We have analysed large epistatic, functional, and transcriptional regulatory networks of genes that were attributed to the following biological process groupings: protein transactions, gene expression, cell cycle, and small molecule metabolism. Control analyses were performed on networks of randomly selected genes. We identified novel biological features emerging from the analysis of functionally homogenous biological networks irrespective of their size. In particular, direct regulation by transcription as an underrepresented feature of protein transactions. The analysis also demonstrated that the regulation of the genes involved in protein transactions at the transcriptional level was orchestrated by only a small number of regulators. Quantitative proteomic analysis of nuclear- and chromatin-enriched sub-cellular fractions of yeast provided supportive evidence for the conclusions generated by network analyses.


Computation ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 48
Author(s):  
Georgios N. Dimitrakopoulos

In Systems Biology, the complex relationships between different entities in the cells are modeled and analyzed using networks. Towards this aim, a rich variety of gene regulatory network (GRN) inference algorithms has been developed in recent years. However, most algorithms rely solely on gene expression data to reconstruct the network. Due to possible expression profile similarity, predictions can contain connections between biologically unrelated genes. Therefore, previously known biological information should also be considered by computational methods to obtain more consistent results, such as experimentally validated interactions between transcription factors and target genes. In this work, we propose XGBoost for gene regulatory networks (XGRN), a supervised algorithm, which combines gene expression data with previously known interactions for GRN inference. The key idea of our method is to train a regression model for each known interaction of the network and then utilize this model to predict new interactions. The regression is performed by XGBoost, a state-of-the-art algorithm using an ensemble of decision trees. In detail, XGRN learns a regression model based on gene expression of the two interactors and then provides predictions using as input the gene expression of other candidate interactors. Application on benchmark datasets and a real large single-cell RNA-Seq experiment resulted in high performance compared to other unsupervised and supervised methods, demonstrating the ability of XGRN to provide reliable predictions.


2021 ◽  
Author(s):  
Anthony Federico ◽  
Joseph Kern ◽  
Xaralabos Varelas ◽  
Stefano Monti

Network analysis offers a powerful technique to model the relationships between genes within biological regulatory networks. Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE - Structure Learning for Hierarchical Networks - a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple networks from high-dimensional data. We show through simulations SHINE improves performance when relatively few samples are available and multiple networks are desired, by reducing the complexity of the graphical search space and by taking advantage of shared structural information. We evaluated SHINE on TCGA Pan-Cancer data and found learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-29 ◽  
Author(s):  
Jörg Rahnenführer ◽  
Francisco S Domingues ◽  
Jochen Maydt ◽  
Thomas Lengauer

We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.


Author(s):  
Gourab Ghosh Roy ◽  
Nicholas Geard ◽  
Karin Verspoor ◽  
Shan He

Abstract Motivation Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. Results To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. Availability and implementation Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Léo P.M. Diaz ◽  
Michael P.H. Stumpf

AbstractNetwork inference is a notoriously challenging problem. Inferred networks are associated with high uncertainty and likely riddled with false positive and false negative interactions. Especially for biological networks we do not have good ways of judging the performance of inference methods against real networks, and instead we often rely solely on the performance against simulated data. Gaining confidence in networks inferred from real data nevertheless thus requires establishing reliable validation methods. Here, we argue that the expectation of mixing patterns in biological networks such as gene regulatory networks offers a reasonable starting point: interactions are more likely to occur between nodes with similar biological functions. We can quantify this behaviour using the assortativity coefficient, and here we show that the resulting heuristic, functional assortativity, offers a reliable and informative route for comparing different inference algorithms.


2016 ◽  
Author(s):  
Jigar S. Desai ◽  
Ryan C. Sartor ◽  
Lovely Mae Lawas ◽  
SV Krishna Jagadish ◽  
Colleen J. Doherty

AbstractOrganisms respond to changes in their environment through transcriptional regulatory networks (TRNs). The regulatory hierarchy of these networks can be inferred from expression data. Computational approaches to identify TRNs can be applied in any species where quality RNA can be acquired, However, ChIP-Seq and similar validation methods are challenging to employ in non-model species. Improving the accuracy of computational inference methods can significantly reduce the cost and time of subsequent validation experiments. We have developed ExRANGES, an approach that improves the ability to computationally infer TRN from time series expression data. ExRANGES utilizes both the rate of change in expression and the absolute expression level to identify TRN connections. We evaluated ExRANGES in five data sets from different model systems. ExRANGES improved the identification of experimentally validated transcription factor targets for all species tested, even in unevenly spaced and sparse data sets. This improved ability to predict known regulator-target relationships enhances the utility of network inference approaches in non-model species where experimental validation is challenging. We integrated ExRANGES with two different network construction approaches and it has been implemented as an R package available here: http://github.com/DohertyLab/ExRANGES. To install the package type: devtools::install_github(“DohertyLab/ExRANGES”)


2020 ◽  
Author(s):  
Harsh Shrivastava ◽  
Xiuwei Zhang ◽  
Srinivas Aluru ◽  
Le Song

AbstractMotivationGene regulatory networks (GRNs) are graphs that specify the interactions between transcription factors (TFs) and their target genes. Understanding these interactions is crucial for studying the mechanisms in cell differentiation, growth and development. Computational methods are needed to infer these networks from measured data. Although the availability of single cell RNA-Sequencing (scRNA-Seq) data provides unprecedented scale and resolution of gene-expression data, the inference of GRNs remains a challenge, mainly due to the complexity of the regulatory relationships and the noise in the data.ResultsWe propose GRNUlar, a novel deep learning architecture based on the unrolled algorithms idea for GRN inference from scRNA-Seq data. Like some existing methods which use prior information of which genes are TFs, GRNUlar also incorporates this TF information using a sparse multi-task deep learning architecture. We also demonstrate the application of a recently developed unrolled architecture GLAD to recover undirected GRNs in the absence of TF information. These unrolled architectures require supervision to train, for which we leverage the existing synthetic data simulators which generate scRNA-Seq data guided by a GRN. We show that unrolled algorithms outperform the state-of-the-art methods on synthetic data as well as real datasets in both the settings of TF information being absent or available.AvailabilityGithub link to GRNUlar - https://github.com/Harshs27/[email protected]


2021 ◽  
Author(s):  
Basak Kocaoglu ◽  
William Alexander

Degeneracy, the ability of structurally different elements to perform similar functions, is a property of many biological systems. Systems exhibiting a high degree of degeneracy continue to exhibit the same macroscopic behavior following a lesion even though the underlying network dynamics are significantly different. Degeneracy thus suggests how biological systems can thrive despite changes to internal and external demands. Although degeneracy is a feature of network topologies and seems to be implicated in a wide variety of biological processes, research on degeneracy in biological networks is mostly limited to weighted networks (e.g., neural networks). To date, there has been no extensive investigation of information theoretic measures of degeneracy in other types of biological networks. In this paper, we apply existing approaches for quantifying degeneracy to random Boolean networks used for modeling biological gene regulatory networks. Using random Boolean networks with randomly generated rulesets to generate synthetic gene expression data sets, we systematically investigate the effect of network lesions on measures of degeneracy. Our results are comparable to measures of degeneracy using weighted networks, and this suggests that degeneracy measures may be a useful tool for investigating gene regulatory networks.


Sign in / Sign up

Export Citation Format

Share Document