Gsmodutils: A python based framework for test-driven genome scale metabolic model development

AbstractMotivationGenome scale metabolic models (GSMMs) are increasingly important for systems biology and metabolic engineering research as they are capable of simulating complex steady-state behaviour. Constraints based models of this form can include thousands of reactions and metabolites, with many crucial pathways that only become activated in specific simulation settings. However, despite their widespread use, power and the availability of tools to aid with the construction and analysis of large scale models, little methodology is suggested for the continued management of curated large scale models. For example, when genome annotations are updated or new understanding regarding behaviour of is discovered, models often need to be altered to reflect this. This is quickly becoming an issue for industrial systems and synthetic biotechnology applications, which require good quality reusable models integral to the design, build and test cycle.ResultsAs part of an ongoing effort to improve genome scale metabolic analysis, we have developed a test-driven development methodology for the continuous integration of validation data from different sources. Contributing to the open source technology based around COBRApy, we have developed thegsmodutilsmodelling framework placing an emphasis on test-driven design of models through defined test cases. Crucially, different conditions are configurable allowing users to examine how different designs or curation impact a wide range of system behaviours, minimising error between model versions.AvailabilityThe software framework described within this paper is open source and freely available fromhttp://github.com/SBRCNottingham/gsmodutils

Download Full-text

Gsmodutils: a python based framework for test-driven genome scale metabolic model development

Bioinformatics ◽

10.1093/bioinformatics/btz088 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3397-3403 ◽

Cited By ~ 2

Author(s):

James Gilbert ◽

Nicole Pearcy ◽

Rupert Norman ◽

Thomas Millat ◽

Klaus Winzer ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Model Development ◽

Metabolic Model ◽

Supplementary Information ◽

Validation Data ◽

Engineering Research ◽

Modelling Framework ◽

Wide Range ◽

Genome Scale

AbstractMotivationGenome scale metabolic models (GSMMs) are increasingly important for systems biology and metabolic engineering research as they are capable of simulating complex steady-state behaviour. Constraints based models of this form can include thousands of reactions and metabolites, with many crucial pathways that only become activated in specific simulation settings. However, despite their widespread use, power and the availability of tools to aid with the construction and analysis of large scale models, little methodology is suggested for their continued management. For example, when genome annotations are updated or new understanding regarding behaviour is discovered, models often need to be altered to reflect this. This is quickly becoming an issue for industrial systems and synthetic biotechnology applications, which require good quality reusable models integral to the design, build, test and learn cycle.ResultsAs part of an ongoing effort to improve genome scale metabolic analysis, we have developed a test-driven development methodology for the continuous integration of validation data from different sources. Contributing to the open source technology based around COBRApy, we have developed the gsmodutils modelling framework placing an emphasis on test-driven design of models through defined test cases. Crucially, different conditions are configurable allowing users to examine how different designs or curation impact a wide range of system behaviours, minimizing error between model versions.Availability and implementationThe software framework described within this paper is open source and freely available from http://github.com/SBRCNottingham/gsmodutils.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Integration of enzyme constraints in a genome-scale metabolic model of Aspergillus niger improves phenotype predictions

Microbial Cell Factories ◽

10.1186/s12934-021-01614-2 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Jingru Zhou ◽

Yingping Zhuang ◽

Jianye Xia

Keyword(s):

Aspergillus Niger ◽

Large Scale ◽

Measurement Techniques ◽

Metabolic Model ◽

System Level ◽

Metabolic Phenotype ◽

Omics Data ◽

Prediction Ability ◽

Phenotype Prediction ◽

Genome Scale

Abstract Background Genome-scale metabolic model (GSMM) is a powerful tool for the study of cellular metabolic characteristics. With the development of multi-omics measurement techniques in recent years, new methods that integrating multi-omics data into the GSMM show promising effects on the predicted results. It does not only improve the accuracy of phenotype prediction but also enhances the reliability of the model for simulating complex biochemical phenomena, which can promote theoretical breakthroughs for specific gene target identification or better understanding the cell metabolism on the system level. Results Based on the basic GSMM model iHL1210 of Aspergillus niger, we integrated large-scale enzyme kinetics and proteomics data to establish a GSMM based on enzyme constraints, termed a GEM with Enzymatic Constraints using Kinetic and Omics data (GECKO). The results show that enzyme constraints effectively improve the model’s phenotype prediction ability, and extended the model’s potential to guide target gene identification through predicting metabolic phenotype changes of A. niger by simulating gene knockout. In addition, enzyme constraints significantly reduced the solution space of the model, i.e., flux variability over 40.10% metabolic reactions were significantly reduced. The new model showed also versatility in other aspects, like estimating large-scale $$k_{{cat}}$$ k cat values, predicting the differential expression of enzymes under different growth conditions. Conclusions This study shows that incorporating enzymes’ abundance information into GSMM is very effective for improving model performance with A. niger. Enzyme-constrained model can be used as a powerful tool for predicting the metabolic phenotype of A. niger by incorporating proteome data. In the foreseeable future, with the fast development of measurement techniques, and more precise and rich proteomics quantitative data being obtained for A. niger, the enzyme-constrained GSMM model will show greater application space on the system level.

Download Full-text

Global scale hydrological modelling at 100 m, 1 h resolution, in Python

10.5194/egusphere-egu21-7154 ◽

2021 ◽

Author(s):

Kor de Jong ◽

Marc van Kreveld ◽

Debabrata Panja ◽

Oliver Schmitz ◽

Derek Karssenberg

Keyword(s):

Large Scale ◽

Model Building ◽

Flow Direction ◽

Building Blocks ◽

Global Scale ◽

Data Availability ◽

Hydrological Models ◽

Continental Scale ◽

Modelling Framework ◽

Scale Models

Data availability at global scale is increasing exponentially. Although considerable challenges remain regarding the identification of model structure and parameters of continental scale hydrological models, we will soon reach the situation that global scale models could be defined at very high resolutions close to 100 m or less. One of the key challenges is how to make simulations of these ultra-high resolution models tractable ([1]).Our research contributes by the development of a model building framework that is specifically designed to distribute calculations over multiple cluster nodes. This framework enables domain experts like hydrologists to develop their own large scale models, using a scripting language like Python, without the need to acquire the skills to develop low-level computer code for parallel and distributed computing.We present the design and implementation of this software framework and illustrate its use with a prototype 100 m, 1 h continental scale hydrological model. Our modelling framework ensures that any model built with it is parallelized. This is made possible by providing the model builder with a set of building blocks of models, which are coded in such a manner that parallelization of calculations occurs within and across these building blocks, for any combination of building blocks. There is thus full flexibility on the side of the modeller, without losing performance.This breakthrough is made possible by applying a novel approach to the implementation of the model building framework, called asynchronous many-tasks, provided by the HPX C++ software library ([3]). The code in the model building framework expresses spatial operations as large collections of interdependent tasks that can be executed efficiently on individual laptops as well as computer clusters ([2]). Our framework currently includes the most essential operations for building large scale hydrological models, including those for simulating transport of material through a flow direction network. By combining these operations, we rebuilt an existing 100 m, 1 h resolution model, thus far used for simulations of small catchments, requiring limited coding as we only had to replace the computational back end of the existing model. Runs at continental scale on a computer cluster show acceptable strong and weak scaling providing a strong indication that global simulations at this resolution will soon be possible, technically speaking.Future work will focus on extending the set of modelling operations and adding scalable I/O, after which existing models that are currently limited in their ability to use the computational resources available to them can be ported to this new environment.More information about our modelling framework is at https://lue.computationalgeography.org.References[1] M. Bierkens. Global hydrology 2015: State, trends, and directions. Water Resources Research, 51(7):4923&#8211;4947, 2015. [2] K. de Jong, et al. An environmental modelling framework based on asynchronous many-tasks: scalability and usability. Submitted. [3] H. Kaiser, et al. HPX - The C++ standard library for parallelism and concurrency. Journal of Open Source Software, 5(53):2352, 2020.

Download Full-text

The Open Global Glacier Model (OGGM) v1.0

10.5194/gmd-2018-9 ◽

2018 ◽

Cited By ~ 5

Author(s):

Fabien Maussion ◽

Anton Butenko ◽

Julia Eis ◽

Kévin Fourteau ◽

Alexander H. Jarosch ◽

...

Keyword(s):

Mass Balance ◽

Open Source ◽

Sea Level ◽

Computational Cost ◽

Balance Model ◽

Climate Data ◽

Estimation Model ◽

Model Framework ◽

Validation Data ◽

Wide Range

Abstract. Despite of their importance for sea-level rise, seasonal water availability, and as source of geohazards, mountain glaciers are one of the few remaining sub-systems of the global climate system for which no globally applicable, open source, community-driven model exists. Here we present the Open Global Glacier Model (OGGM, http://www.oggm.org), developed to provide a modular and open source numerical model framework for simulating past and future change of any glacier in the world. The modelling chain comprises data downloading tools (glacier outlines, topography, climate, validation data), a preprocessing module, a mass-balance model, a distributed ice thickness estimation model, and an ice flow model. The monthly mass-balance is obtained from gridded climate data and a temperature index melt model. To our knowledge, OGGM is the first global model explicitly simulating glacier dynamics: the model relies on the shallow ice approximation to compute the depth-integrated flux of ice along multiple connected flowlines. In this paper, we describe and illustrate each processing step by applying the model to a selection of glaciers before running global simulations under idealized climate forcings. Even without an in-depth calibration, the model shows a very realistic behaviour. We are able to reproduce earlier estimates of global glacier volume by varying the ice dynamical parameters within a range of plausible values. At the same time, the increased complexity of OGGM compared to other prevalent global glacier models comes at a reasonable computational cost: several dozens of glaciers can be simulated on a personal computer, while global simulations realized in a supercomputing environment take up to a few hours per century. Thanks to the modular framework, modules of various complexity can be added to the codebase, allowing to run new kinds of model intercomparisons in a controlled environment. Future developments will add new physical processes to the model as well as tools to calibrate the model in a more comprehensive way. OGGM spans a wide range of applications, from ice-climate interaction studies at millenial time scales to estimates of the contribution of glaciers to past and future sea-level change. It has the potential to become a self-sustained, community driven model for global and regional glacier evolution.

Download Full-text

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_4 ◽

2021 ◽

pp. 47-59

Author(s):

Sacha J. van Albada ◽

Jari Pronold ◽

Alexander van Meegen ◽

Markus Diesmann

Keyword(s):

Open Source ◽

Large Scale ◽

Network Models ◽

Macaque Monkey ◽

Source Model ◽

Model Specification ◽

Data Sets ◽

Neural Network Models ◽

Wide Range ◽

Ict Infrastructure

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.

Download Full-text

The Open Global Glacier Model (OGGM) v1.1

Geoscientific Model Development ◽

10.5194/gmd-12-909-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 909-931 ◽

Cited By ~ 23

Author(s):

Fabien Maussion ◽

Anton Butenko ◽

Nicolas Champollion ◽

Matthias Dusch ◽

Julia Eis ◽

...

Keyword(s):

Mass Balance ◽

Open Source ◽

Sea Level ◽

Computational Cost ◽

Balance Model ◽

Climate Data ◽

Estimation Model ◽

Model Framework ◽

Validation Data ◽

Wide Range

Abstract. Despite their importance for sea-level rise, seasonal water availability, and as a source of geohazards, mountain glaciers are one of the few remaining subsystems of the global climate system for which no globally applicable, open source, community-driven model exists. Here we present the Open Global Glacier Model (OGGM), developed to provide a modular and open-source numerical model framework for simulating past and future change of any glacier in the world. The modeling chain comprises data downloading tools (glacier outlines, topography, climate, validation data), a preprocessing module, a mass-balance model, a distributed ice thickness estimation model, and an ice-flow model. The monthly mass balance is obtained from gridded climate data and a temperature index melt model. To our knowledge, OGGM is the first global model to explicitly simulate glacier dynamics: the model relies on the shallow-ice approximation to compute the depth-integrated flux of ice along multiple connected flow lines. In this paper, we describe and illustrate each processing step by applying the model to a selection of glaciers before running global simulations under idealized climate forcings. Even without an in-depth calibration, the model shows very realistic behavior. We are able to reproduce earlier estimates of global glacier volume by varying the ice dynamical parameters within a range of plausible values. At the same time, the increased complexity of OGGM compared to other prevalent global glacier models comes at a reasonable computational cost: several dozen glaciers can be simulated on a personal computer, whereas global simulations realized in a supercomputing environment take up to a few hours per century. Thanks to the modular framework, modules of various complexity can be added to the code base, which allows for new kinds of model intercomparison studies in a controlled environment. Future developments will add new physical processes to the model as well as automated calibration tools. Extensions or alternative parameterizations can be easily added by the community thanks to comprehensive documentation. OGGM spans a wide range of applications, from ice–climate interaction studies at millennial timescales to estimates of the contribution of glaciers to past and future sea-level change. It has the potential to become a self-sustained community-driven model for global and regional glacier evolution.

Download Full-text

Backbone—An Adaptable Energy Systems Modelling Framework

Energies ◽

10.3390/en12173388 ◽

2019 ◽

Vol 12 (17) ◽

pp. 3388 ◽

Cited By ~ 5

Author(s):

Niina Helistö ◽

Juha Kiviluoma ◽

Jussi Ikäheimo ◽

Topi Rasku ◽

Erkka Rinne ◽

...

Keyword(s):

Power Plants ◽

Large Scale ◽

Unit Commitment ◽

Energy Systems ◽

Mixed Integer ◽

Investment Planning ◽

Planning And Scheduling ◽

Modelling Framework ◽

Wide Range ◽

Systems Modelling

Backbone represents a highly adaptable energy systems modelling framework, which can be utilised to create models for studying the design and operation of energy systems, both from investment planning and scheduling perspectives. It includes a wide range of features and constraints, such as stochastic parameters, multiple reserve products, energy storage units, controlled and uncontrolled energy transfers, and, most significantly, multiple energy sectors. The formulation is based on mixed-integer programming and takes into account unit commitment decisions for power plants and other energy conversion facilities. Both high-level large-scale systems and fully detailed smaller-scale systems can be appropriately modelled. The framework has been implemented as the open-source Backbone modelling tool using General Algebraic Modeling System (GAMS). An application of the framework is demonstrated using a power system example, and Backbone is shown to produce results comparable to a commercial tool. However, the adaptability of Backbone further enables the creation and solution of energy systems models relatively easily for many different purposes and thus it improves on the available methodologies.

Download Full-text

3D Printing Complex Structures Using Modeling and Simulation

Volume 8: 27th Conference on Mechanical Vibration and Noise ◽

10.1115/detc2015-47916 ◽

2015 ◽

Author(s):

Hammad Mazhar

Keyword(s):

3D Printing ◽

Open Source ◽

Data Structures ◽

Large Scale ◽

Selective Layer ◽

Parallel Data ◽

Wide Range ◽

Data Structures And Algorithms ◽

Multi Body ◽

Dynamics Problems

This paper describes an open source parallel simulation framework capable of simulating large-scale granular and multi-body dynamics problems. This framework, called Chrono::Parallel, builds upon the modeling capabilities of Chrono::Engine, another open source simulation package, and leverages parallel data structures to enable scalable simulation of large problems. Chrono::Parallel is somewhat unique in that it was designed from the ground up to leverage parallel data structures and algorithms so that it scales across a wide range of computer architectures and yet has a rich modeling capability for simulating many different types of problems. The modeling capabilities of Chrono::Parallel will be demonstrated in the context of additive manufacturing and 3D printing by modeling the Selective Layer Sintering layering process and simulating large complex interlocking structures which require compression and folding to fit into a 3D printer’s build volume.

Download Full-text

Classification of protein–protein association rates based on biophysical informatics

BMC Bioinformatics ◽

10.1186/s12859-021-04323-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kalyani Dhusia ◽

Yinghao Wu

Keyword(s):

Large Scale ◽

Cross Validation ◽

Protein Complexes ◽

Dynamic Properties ◽

Conformational Dynamics ◽

Classification Model ◽

Coarse Grained ◽

Modeling Framework ◽

Validation Data ◽

Wide Range

Abstract Background Proteins form various complexes to carry out their versatile functions in cells. The dynamic properties of protein complex formation are mainly characterized by the association rates which measures how fast these complexes can be formed. It was experimentally observed that the association rates span an extremely wide range with over ten orders of magnitudes. Identification of association rates within this spectrum for specific protein complexes is therefore essential for us to understand their functional roles. Results To tackle this problem, we integrate physics-based coarse-grained simulations into a neural-network-based classification model to estimate the range of association rates for protein complexes in a large-scale benchmark set. The cross-validation results show that, when an optimal threshold was selected, we can reach the best performance with specificity, precision, sensitivity and overall accuracy all higher than 70%. The quality of our cross-validation data has also been testified by further statistical analysis. Additionally, given an independent testing set, we can successfully predict the group of association rates for eight protein complexes out of ten. Finally, the analysis of failed cases suggests the future implementation of conformational dynamics into simulation can further improve model. Conclusions In summary, this study demonstrated that a new modeling framework that combines biophysical simulations with bioinformatics approaches is able to identify protein–protein interactions with low association rates from those with higher association rates. This method thereby can serve as a useful addition to a collection of existing experimental approaches that measure biomolecular recognition.

Download Full-text

RADAR-Base: Open Source Mobile Health Platform for Collecting, Monitoring, and Analyzing Data Using Sensors, Wearables, and Mobile Devices

JMIR mhealth and uhealth ◽

10.2196/11734 ◽

2019 ◽

Vol 7 (8) ◽

pp. e11734 ◽

Cited By ~ 13

Author(s):

Yatharth Ranjan ◽

Zulqarnain Rashid ◽

Callum Stewart ◽

Pauline Conde ◽

Mark Begale ◽

...

Keyword(s):

High Resolution ◽

Data Collection ◽

Open Source ◽

Mobile Health ◽

Large Scale ◽

Streaming Data ◽

Security And Privacy ◽

The European Union ◽

Wide Range ◽

Remote Assessment

Background With a wide range of use cases in both research and clinical domains, collecting continuous mobile health (mHealth) streaming data from multiple sources in a secure, highly scalable, and extensible platform is of high interest to the open source mHealth community. The European Union Innovative Medicines Initiative Remote Assessment of Disease and Relapse-Central Nervous System (RADAR-CNS) program is an exemplary project with the requirements to support the collection of high-resolution data at scale; as such, the Remote Assessment of Disease and Relapse (RADAR)-base platform is designed to meet these needs and additionally facilitate a new generation of mHealth projects in this nascent field. Objective Wide-bandwidth networks, smartphone penetrance, and wearable sensors offer new possibilities for collecting near-real-time high-resolution datasets from large numbers of participants. The aim of this study was to build a platform that would cater for large-scale data collection for remote monitoring initiatives. Key criteria are around scalability, extensibility, security, and privacy. Methods RADAR-base is developed as a modular application; the backend is built on a backbone of the highly successful Confluent/Apache Kafka framework for streaming data. To facilitate scaling and ease of deployment, we use Docker containers to package the components of the platform. RADAR-base provides 2 main mobile apps for data collection, a Passive App and an Active App. Other third-Party Apps and sensors are easily integrated into the platform. Management user interfaces to support data collection and enrolment are also provided. Results General principles of the platform components and design of RADAR-base are presented here, with examples of the types of data currently being collected from devices used in RADAR-CNS projects: Multiple Sclerosis, Epilepsy, and Depression cohorts. Conclusions RADAR-base is a fully functional, remote data collection platform built around Confluent/Apache Kafka and provides off-the-shelf components for projects interested in collecting mHealth datasets at scale.

Download Full-text