The use of machine learning to discover regulatory networks controlling biological systems

2022 ◽  
Author(s):  
Rossin Erbe ◽  
Jessica Gore ◽  
Kelly Gemmill ◽  
Daria A. Gaykalova ◽  
Elana J. Fertig
2015 ◽  
Vol 13 (03) ◽  
pp. 1541006 ◽  
Author(s):  
Asako Komori ◽  
Yukihiro Maki ◽  
Isao Ono ◽  
Masahiro Okamoto

Biological systems are composed of biomolecules such as genes, proteins, metabolites, and signaling components, which interact in complex networks. To understand complex biological systems, it is important to be capable of inferring regulatory networks from experimental time series data. In previous studies, we developed efficient numerical optimization methods for inferring these networks, but we have yet to test the performance of our methods when considering the error (noise) that is inherent in experimental data. In this study, we investigated the noise tolerance of our proposed inferring engine. We prepared the noise data using the Langevin equation, and compared the performance of our method with that of alternative optimization methods.


Author(s):  
Mohammed Eslami ◽  
Amin Espah-Borujeni ◽  
Hamed Eramian ◽  
Mark Weston ◽  
George Zheng ◽  
...  

Abstract Motivation Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations. Results The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene’s dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of > 90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, E. coli and B. subtilis, using new experiments conducted post training. Finally, while the HRM is trained on gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in-silico to focus experiments. Availability The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Xueming Liu ◽  
Enrico Maiorino ◽  
Arda Halu ◽  
Joseph Loscalzo ◽  
Jianxi Gao ◽  
...  

AbstractRobustness is a prominent feature of most biological systems. In a cell, the structure of the interactions between genes, proteins, and metabolites has a crucial role in maintaining the cell’s functionality and viability in presence of external perturbations and noise. Despite advances in characterizing the robustness of biological systems, most of the current efforts have been focused on studying homogeneous molecular networks in isolation, such as protein-protein or gene regulatory networks, neglecting the interactions among different molecular substrates. Here we propose a comprehensive framework for understanding how the interactions between genes, proteins and metabolites contribute to the determinants of robustness in a heterogeneous biological network. We integrate heterogeneous sources of data to construct a multilayer interaction network composed of a gene regulatory layer, and protein-protein interaction layer and a metabolic layer. We design a simulated perturbation process to characterize the contribution of each gene to the overall system’s robustness, defined as its influence over the global network. We find that highly influential genes are enriched in essential and cancer genes, confirming the central role of these genes in critical cellular processes. Further, we determine that the metabolic layer is more vulnerable to perturbations involving genes associated to metabolic diseases. By comparing the robustness of the network to multiple randomized network models, we find that the real network is comparably or more robust than expected in the random realizations. Finally, we analytically derive the expected robustness of multilayer biological networks starting from the degree distributions within or between layers. These results provide new insights into the non-trivial dynamics occurring in the cell after a genetic perturbation is applied, confirming the importance of including the coupling between different layers of interaction in models of complex biological systems.


2019 ◽  
Author(s):  
Sheng-Yong Niu ◽  
Binqiang Liu ◽  
Qin Ma ◽  
Wen-Chi Chou

AbstractA transcription unit (TU) is composed of one or multiple adjacent genes on the same strand that are co-transcribed in mostly prokaryotes. Accurate identification of TUs is a crucial first step to delineate the transcriptional regulatory networks and elucidate the dynamic regulatory mechanisms encoded in various prokaryotic genomes. Many genomic features, e.g., gene intergenic distance, and transcriptomic features including continuous and stable RNA-seq reads count signals, have been collected from a large amount of experimental data and integrated into classification techniques to computationally predict genome-wide TUs. Although some tools and web servers are able to predict TUs based on bacterial RNA-seq data and genome sequences, there is a need to have an improved machine-learning prediction approach and a better comprehensive pipeline handling QC, TU prediction, and TU visualization. To enable users to efficiently perform TU identification on their local computers or high-performance clusters and provide a more accurate prediction, we develop an R package, named rSeqTU. rSeqTU uses a random forest algorithm to select essential features describing TUs and then uses support vector machine (SVM) to build TU prediction models. rSeqTU (available at https://s18692001.github.io/rSeqTU/) has six computational functionalities including read quality control, read mapping, training set generation, random-forest-based feature selection, TU prediction, and TU visualization.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Stephen Kotiang ◽  
Ali Eslami

Abstract Background The desire to understand genomic functions and the behavior of complex gene regulatory networks has recently been a major research focus in systems biology. As a result, a plethora of computational and modeling tools have been proposed to identify and infer interactions among biological entities. Here, we consider the general question of the effect of perturbation on the global dynamical network behavior as well as error propagation in biological networks to incite research pertaining to intervention strategies. Results This paper introduces a computational framework that combines the formulation of Boolean networks and factor graphs to explore the global dynamical features of biological systems. A message-passing algorithm is proposed for this formalism to evolve network states as messages in the graph. In addition, the mathematical formulation allows us to describe the dynamics and behavior of error propagation in gene regulatory networks by conducting a density evolution (DE) analysis. The model is applied to assess the network state progression and the impact of gene deletion in the budding yeast cell cycle. Simulation results show that our model predictions match published experimental data. Also, our findings reveal that the sample yeast cell-cycle network is not only robust but also consistent with real high-throughput expression data. Finally, our DE analysis serves as a tool to find the optimal values of network parameters for resilience against perturbations, especially in the inference of genetic graphs. Conclusion Our computational framework provides a useful graphical model and analytical tools to study biological networks. It can be a powerful tool to predict the consequences of gene deletions before conducting wet bench experiments because it proves to be a quick route to predicting biologically relevant dynamic properties without tunable kinetic parameters.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jiyoung Lee ◽  
Shuo Geng ◽  
Song Li ◽  
Liwu Li

Subclinical doses of LPS (SD-LPS) are known to cause low-grade inflammatory activation of monocytes, which could lead to inflammatory diseases including atherosclerosis and metabolic syndrome. Sodium 4-phenylbutyrate is a potential therapeutic compound which can reduce the inflammation caused by SD-LPS. To understand the gene regulatory networks of these processes, we have generated scRNA-seq data from mouse monocytes treated with these compounds and identified 11 novel cell clusters. We have developed a machine learning method to integrate scRNA-seq, ATAC-seq, and binding motifs to characterize gene regulatory networks underlying these cell clusters. Using guided regularized random forest and feature selection, our method achieved high performance and outperformed a traditional enrichment-based method in selecting candidate regulatory genes. Our method is particularly efficient in selecting a few candidate genes to explain observed expression pattern. In particular, among 531 candidate TFs, our method achieves an auROC of 0.961 with only 10 motifs. Finally, we found two novel subpopulations of monocyte cells in response to SD-LPS and we confirmed our analysis using independent flow cytometry experiments. Our results suggest that our new machine learning method can select candidate regulatory genes as potential targets for developing new therapeutics against low grade inflammation.


2020 ◽  
Vol 31 (14) ◽  
pp. 1498-1511 ◽  
Author(s):  
Grace A. McLaughlin ◽  
Erin M. Langdon ◽  
John M. Crutchley ◽  
Liam J. Holt ◽  
M. Gregory Forest ◽  
...  

The structure of the cytosol across different length scales is a debated topic in cell biology. Here we present tools to measure the physical state of the cytosol by analyzing the 3D motion of nanoparticles expressed in cells. We find evidence that the physical structure of the cytosol is a fundamental source of variability in biological systems.


Sign in / Sign up

Export Citation Format

Share Document