scholarly journals Bayesian inference of metabolic kinetics from genome-scale multiomics data

2018 ◽  
Author(s):  
Peter C. St. John ◽  
Jonathan Strutz ◽  
Linda J. Broadbelt ◽  
Keith E.J. Tyo ◽  
Yannick J. Bomble

SummaryModern biological tools generate a wealth of data on metabolite and protein concentrations that can be used to help inform new strain designs. However, integrating these data sources to generate predictions of steady-state metabolism typically requires a kinetic description of the enzymatic reactions that occur within a cell. Parameterizing these kinetic models from biological data can be computationally difficult, especially as the amount of data increases. Robust methods must also be able to quantify the uncertainty in model parameters as a function of the available data, which can be particularly computationally intensive. The field of Bayesian inference offers a wide range of methods for estimating distributions in parameter uncertainty. However, these techniques are poorly suited to kinetic metabolic modeling due to the complex kinetic rate laws typically employed and the resulting dynamic system that must be solved. In this paper, we employ linear-logarithmic kinetics to simplify the calculation of steady-state flux distributions and enable efficient sampling and variational inference methods. We demonstrate that detailed information on the posterior distribution of kinetic model parameters can be obtained efficiently at a variety of different problem scales, including large-scale kinetic models trained on multiomics datasets. These results allow modern Bayesian machine learning tools to be leveraged in understanding biological data and developing new, efficient strain designs.

2000 ◽  
Vol 663 ◽  
Author(s):  
J. Samper ◽  
R. Juncosa ◽  
V. Navarro ◽  
J. Delgado ◽  
L. Montenegro ◽  
...  

ABSTRACTFEBEX (Full-scale Engineered Barrier EXperiment) is a demonstration and research project dealing with the bentonite engineered barrier designed for sealing and containment of waste in a high level radioactive waste repository (HLWR). It includes two main experiments: an situ full-scale test performed at Grimsel (GTS) and a mock-up test operating since February 1997 at CIEMAT facilities in Madrid (Spain) [1,2,3]. One of the objectives of FEBEX is the development and testing of conceptual and numerical models for the thermal, hydrodynamic, and geochemical (THG) processes expected to take place in engineered clay barriers. A significant improvement in coupled THG modeling of the clay barrier has been achieved both in terms of a better understanding of THG processes and more sophisticated THG computer codes. The ability of these models to reproduce the observed THG patterns in a wide range of THG conditions enhances the confidence in their prediction capabilities. Numerical THG models of heating and hydration experiments performed on small-scale lab cells provide excellent results for temperatures, water inflow and final water content in the cells [3]. Calculated concentrations at the end of the experiments reproduce most of the patterns of measured data. In general, the fit of concentrations of dissolved species is better than that of exchanged cations. These models were later used to simulate the evolution of the large-scale experiments (in situ and mock-up). Some thermo-hydrodynamic hypotheses and bentonite parameters were slightly revised during TH calibration of the mock-up test. The results of the reference model reproduce simultaneously the observed water inflows and bentonite temperatures and relative humidities. Although the model is highly sensitive to one-at-a-time variations in model parameters, the possibility of parameter combinations leading to similar fits cannot be precluded. The TH model of the “in situ” test is based on the same bentonite TH parameters and assumptions as for the “mock-up” test. Granite parameters were slightly modified during the calibration process in order to reproduce the observed thermal and hydrodynamic evolution. The reference model captures properly relative humidities and temperatures in the bentonite [3]. It also reproduces the observed spatial distribution of water pressures and temperatures in the granite. Once calibrated the TH aspects of the model, predictions of the THG evolution of both tests were performed. Data from the dismantling of the in situ test, which is planned for the summer of 2001, will provide a unique opportunity to test and validate current THG models of the EBS.


2021 ◽  
Author(s):  
Andrew J Kavran ◽  
Aaron Clauset

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.Conclusions: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2006 ◽  
Vol 14 (02) ◽  
pp. 275-293 ◽  
Author(s):  
CHRISTOPHER S. OEHMEN ◽  
TJERK P. STRAATSMA ◽  
GORDON A. ANDERSON ◽  
GALYA ORR ◽  
BOBBIE-JO M. WEBB-ROBERTSON ◽  
...  

The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.


2018 ◽  
Author(s):  
Tuure Hameri ◽  
Georgios Fengos ◽  
Meric Ataman ◽  
Ljubisa Miskovic ◽  
Vassily Hatzimanikatis

AbstractLarge-scale kinetic models are used for designing, predicting, and understanding the metabolic responses of living cells. Kinetic models are particularly attractive for the biosynthesis of target molecules in cells as they are typically better than other types of models at capturing the complex cellular biochemistry. Using simpler stoichiometric models as scaffolds, kinetic models are built around a steady-state flux profile and a metabolite concentration vector that are typically determined via optimization. However, as the underlying optimization problem is underdetermined, even after incorporating available experimental omics data, one cannot uniquely determine the operational configuration in terms of metabolic fluxes and metabolite concentrations. As a result, some reactions can operate in either the forward or reverse direction while still agreeing with the observed physiology. Here, we analyze how the underlying uncertainty in intracellular fluxes and concentrations affects predictions of constructed kinetic models and their design in metabolic engineering and systems biology studies. To this end, we integrated the omics data of optimally grownEscherichia coliinto a stoichiometric model and constructed populations of non-linear large-scale kinetic models of alternative steady-state solutions consistent with the physiology of theE. coliaerobic metabolism. We performed metabolic control analysis (MCA) on these models, highlighting that MCA-based metabolic engineering decisions are strongly affected by the selected steady state and appear to be more sensitive to concentration values rather than flux values. To incorporate this into future studies, we propose a workflow for moving towards more reliable and robust predictions that are consistent with all alternative steady-state solutions. This workflow can be applied to all kinetic models to improve the consistency and accuracy of their predictions. Additionally, we show that, irrespective of the alternative steady-state solution, increased activity of phosphofructokinase and decreased ATP maintenance requirements would improve cellular growth of optimally grownE. coli.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

<div>Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 2.38% to 5.27%. The code and the pre-trained model will be available at https://github.com/linlei1214/SITS-BERT upon publication.</div><div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2017 ◽  
Vol 14 (18) ◽  
pp. 4125-4159 ◽  
Author(s):  
Benoît Pasquier ◽  
Mark Holzer

Abstract. The ocean's nutrient cycles are important for the carbon balance of the climate system and for shaping the ocean's distribution of dissolved elements. Dissolved iron (dFe) is a key limiting micronutrient, but iron scavenging is observationally poorly constrained, leading to large uncertainties in the external sources of iron and hence in the state of the marine iron cycle. Here we build a steady-state model of the ocean's coupled phosphorus, silicon, and iron cycles embedded in a data-assimilated steady-state global ocean circulation. The model includes the redissolution of scavenged iron, parameterization of subgrid topography, and small, large, and diatom phytoplankton functional classes. Phytoplankton concentrations are implicitly represented in the parameterization of biological nutrient utilization through an equilibrium logistic model. Our formulation thus has only three coupled nutrient tracers, the three-dimensional distributions of which are found using a Newton solver. The very efficient numerics allow us to use the model in inverse mode to objectively constrain many biogeochemical parameters by minimizing the mismatch between modeled and observed nutrient and phytoplankton concentrations. Iron source and sink parameters cannot jointly be optimized because of local compensation between regeneration, recycling, and scavenging. We therefore consider a family of possible state estimates corresponding to a wide range of external iron source strengths. All state estimates have a similar mismatch with the observed nutrient concentrations and very similar large-scale dFe distributions. However, the relative contributions of aeolian, sedimentary, and hydrothermal iron to the total dFe concentration differ widely depending on the sources. Both the magnitude and pattern of the phosphorus and opal exports are well constrained, with global values of 8. 1  ±  0. 3 Tmol P yr−1 (or, in carbon units, 10. 3  ±  0. 4 Pg C yr−1) and 171.   ±  3.  Tmol Si yr−1. We diagnose the phosphorus and opal exports supported by aeolian, sedimentary, and hydrothermal iron. The geographic patterns of the export supported by each iron type are well constrained across the family of state estimates. Sedimentary-iron-supported export is important in shelf and large-scale upwelling regions, while hydrothermal iron contributes to export mostly in the Southern Ocean. The fraction of the global export supported by a given iron type varies systematically with its fractional contribution to the total iron source. Aeolian iron is most efficient in supporting export in the sense that its fractional contribution to export exceeds its fractional contribution to the total source. Per source-injected molecule, aeolian iron supports 3. 1  ±  0. 8 times more phosphorus export and 2. 0  ±  0. 5 times more opal export than the other iron types. Conversely, per injected molecule, sedimentary and hydrothermal iron support 2. 3  ±  0. 6 and 4.   ±  2.  times less phosphorus export, and 1. 9  ±  0. 5 and 2.   ±  1.  times less opal export than the other iron types.


2014 ◽  
Author(s):  
R Daniel Kortschak ◽  
David L Adelson

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.


2019 ◽  
Author(s):  
Saratram Gopalakrishnan ◽  
Satyakam Dash ◽  
Costas Maranas

AbstractKinetic models predict the metabolic flows by directly linking metabolite concentrations and enzyme levels to reaction fluxes. Robust parameterization of organism-level kinetic models that faithfully reproduce the effect of different genetic or environmental perturbations remains an open challenge due to the intractability of existing algorithms. This paper introduces K-FIT, an accelerated kinetic parameterization workflow that leverages a novel decomposition approach to identify steady-state fluxes in response to genetic perturbations followed by a gradient-based update of kinetic parameters until predictions simultaneously agree with the fluxomic data in all perturbed metabolic networks. The applicability of K-FIT to large-scale models is demonstrated by parameterizing an expanded kinetic model forE. coli(307 reactions and 258 metabolites) using fluxomic data from six mutants. The achieved thousand-fold speed-up afforded by K-FIT over meta-heuristic approaches is transformational enabling follow-up robustness of inference analyses and optimal design of experiments to inform metabolic engineering strategies.


2017 ◽  
Author(s):  
Tapesh Santra

AbstractA common experimental approach for studying signal transduction networks (STNs) is to measure the steady state concentrations of their components following perturbations to individual components. Such data is frequently used to reconstruct topological models of STNs, but, are rarely used for calibrating kinetic models of these networks. This is because, existing calibration algorithms operate by assigning different sets of values to the parameters of the kinetic models, and for each set of values simulating all perturbations performed in the biochemical experiments. This process is highly computation intensive and may be infeasible when molecular level information of the perturbation experiments is unavailable. Here, I propose an algorithm which can calibrate ordinary differential equation (ODE) based kinetic models of STNs using steady-state perturbation responses (SSPRs) without simulating perturbation experiments. The proposed algorithm uses modular response analysis (MRA) to calculate the scaled Jacobian matrix of the ODE model of an STN using SSPR data. The model parameters are then calibrated to fit the scaled Jacobian matrix calculated in the above step. This procedure does not require simulating the perturbation experiments. Therefore, it is significantly less computation intensive than existing algorithms and can be implemented without molecular level knowledge of the mechanism of perturbations. It is also parallelizable, i.e. can explore multiple sets of parameter values simultaneously, and therefore is scalable. The capabilities and shortcomings of the proposed algorithm are demonstrated using both simulated and real perturbation responses of Mitogen Activated Protein Kinase (MAPK) STN.AvailabilityAll source codes and data needed to replicate the results in this manuscript are available from https://github.com/SBIUCD/MRA_SMC_ABC1


Sign in / Sign up

Export Citation Format

Share Document