Gsmodutils: a python based framework for test-driven genome scale metabolic model development

AbstractMotivationGenome scale metabolic models (GSMMs) are increasingly important for systems biology and metabolic engineering research as they are capable of simulating complex steady-state behaviour. Constraints based models of this form can include thousands of reactions and metabolites, with many crucial pathways that only become activated in specific simulation settings. However, despite their widespread use, power and the availability of tools to aid with the construction and analysis of large scale models, little methodology is suggested for their continued management. For example, when genome annotations are updated or new understanding regarding behaviour is discovered, models often need to be altered to reflect this. This is quickly becoming an issue for industrial systems and synthetic biotechnology applications, which require good quality reusable models integral to the design, build, test and learn cycle.ResultsAs part of an ongoing effort to improve genome scale metabolic analysis, we have developed a test-driven development methodology for the continuous integration of validation data from different sources. Contributing to the open source technology based around COBRApy, we have developed the gsmodutils modelling framework placing an emphasis on test-driven design of models through defined test cases. Crucially, different conditions are configurable allowing users to examine how different designs or curation impact a wide range of system behaviours, minimizing error between model versions.Availability and implementationThe software framework described within this paper is open source and freely available from http://github.com/SBRCNottingham/gsmodutils.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Gsmodutils: A python based framework for test-driven genome scale metabolic model development

10.1101/430116 ◽

2018 ◽

Author(s):

James P Gilbert ◽

Nicole Pearcy ◽

Rupert Norman ◽

Thomas Millat ◽

Klaus Winzer ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Model Development ◽

Metabolic Model ◽

Validation Data ◽

Engineering Research ◽

Modelling Framework ◽

Wide Range ◽

Scale Models ◽

Genome Scale

AbstractMotivationGenome scale metabolic models (GSMMs) are increasingly important for systems biology and metabolic engineering research as they are capable of simulating complex steady-state behaviour. Constraints based models of this form can include thousands of reactions and metabolites, with many crucial pathways that only become activated in specific simulation settings. However, despite their widespread use, power and the availability of tools to aid with the construction and analysis of large scale models, little methodology is suggested for the continued management of curated large scale models. For example, when genome annotations are updated or new understanding regarding behaviour of is discovered, models often need to be altered to reflect this. This is quickly becoming an issue for industrial systems and synthetic biotechnology applications, which require good quality reusable models integral to the design, build and test cycle.ResultsAs part of an ongoing effort to improve genome scale metabolic analysis, we have developed a test-driven development methodology for the continuous integration of validation data from different sources. Contributing to the open source technology based around COBRApy, we have developed thegsmodutilsmodelling framework placing an emphasis on test-driven design of models through defined test cases. Crucially, different conditions are configurable allowing users to examine how different designs or curation impact a wide range of system behaviours, minimising error between model versions.AvailabilityThe software framework described within this paper is open source and freely available fromhttp://github.com/SBRCNottingham/gsmodutils

Download Full-text

Integration of enzyme constraints in a genome-scale metabolic model of Aspergillus niger improves phenotype predictions

Microbial Cell Factories ◽

10.1186/s12934-021-01614-2 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Jingru Zhou ◽

Yingping Zhuang ◽

Jianye Xia

Keyword(s):

Aspergillus Niger ◽

Large Scale ◽

Measurement Techniques ◽

Metabolic Model ◽

System Level ◽

Metabolic Phenotype ◽

Omics Data ◽

Prediction Ability ◽

Phenotype Prediction ◽

Genome Scale

Abstract Background Genome-scale metabolic model (GSMM) is a powerful tool for the study of cellular metabolic characteristics. With the development of multi-omics measurement techniques in recent years, new methods that integrating multi-omics data into the GSMM show promising effects on the predicted results. It does not only improve the accuracy of phenotype prediction but also enhances the reliability of the model for simulating complex biochemical phenomena, which can promote theoretical breakthroughs for specific gene target identification or better understanding the cell metabolism on the system level. Results Based on the basic GSMM model iHL1210 of Aspergillus niger, we integrated large-scale enzyme kinetics and proteomics data to establish a GSMM based on enzyme constraints, termed a GEM with Enzymatic Constraints using Kinetic and Omics data (GECKO). The results show that enzyme constraints effectively improve the model’s phenotype prediction ability, and extended the model’s potential to guide target gene identification through predicting metabolic phenotype changes of A. niger by simulating gene knockout. In addition, enzyme constraints significantly reduced the solution space of the model, i.e., flux variability over 40.10% metabolic reactions were significantly reduced. The new model showed also versatility in other aspects, like estimating large-scale $$k_{{cat}}$$ k cat values, predicting the differential expression of enzymes under different growth conditions. Conclusions This study shows that incorporating enzymes’ abundance information into GSMM is very effective for improving model performance with A. niger. Enzyme-constrained model can be used as a powerful tool for predicting the metabolic phenotype of A. niger by incorporating proteome data. In the foreseeable future, with the fast development of measurement techniques, and more precise and rich proteomics quantitative data being obtained for A. niger, the enzyme-constrained GSMM model will show greater application space on the system level.

Download Full-text

The Open Global Glacier Model (OGGM) v1.0

10.5194/gmd-2018-9 ◽

2018 ◽

Cited By ~ 5

Author(s):

Fabien Maussion ◽

Anton Butenko ◽

Julia Eis ◽

Kévin Fourteau ◽

Alexander H. Jarosch ◽

...

Keyword(s):

Mass Balance ◽

Open Source ◽

Sea Level ◽

Computational Cost ◽

Balance Model ◽

Climate Data ◽

Estimation Model ◽

Model Framework ◽

Validation Data ◽

Wide Range

Abstract. Despite of their importance for sea-level rise, seasonal water availability, and as source of geohazards, mountain glaciers are one of the few remaining sub-systems of the global climate system for which no globally applicable, open source, community-driven model exists. Here we present the Open Global Glacier Model (OGGM, http://www.oggm.org), developed to provide a modular and open source numerical model framework for simulating past and future change of any glacier in the world. The modelling chain comprises data downloading tools (glacier outlines, topography, climate, validation data), a preprocessing module, a mass-balance model, a distributed ice thickness estimation model, and an ice flow model. The monthly mass-balance is obtained from gridded climate data and a temperature index melt model. To our knowledge, OGGM is the first global model explicitly simulating glacier dynamics: the model relies on the shallow ice approximation to compute the depth-integrated flux of ice along multiple connected flowlines. In this paper, we describe and illustrate each processing step by applying the model to a selection of glaciers before running global simulations under idealized climate forcings. Even without an in-depth calibration, the model shows a very realistic behaviour. We are able to reproduce earlier estimates of global glacier volume by varying the ice dynamical parameters within a range of plausible values. At the same time, the increased complexity of OGGM compared to other prevalent global glacier models comes at a reasonable computational cost: several dozens of glaciers can be simulated on a personal computer, while global simulations realized in a supercomputing environment take up to a few hours per century. Thanks to the modular framework, modules of various complexity can be added to the codebase, allowing to run new kinds of model intercomparisons in a controlled environment. Future developments will add new physical processes to the model as well as tools to calibrate the model in a more comprehensive way. OGGM spans a wide range of applications, from ice-climate interaction studies at millenial time scales to estimates of the contribution of glaciers to past and future sea-level change. It has the potential to become a self-sustained, community driven model for global and regional glacier evolution.

Download Full-text

DeepMAsED: evaluating the quality of metagenomic assemblies

Bioinformatics ◽

10.1093/bioinformatics/btaa124 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3011-3017 ◽

Cited By ~ 5

Author(s):

Olga Mineeva ◽

Mateo Rojas-Carulla ◽

Ruth E Ley ◽

Bernhard Schölkopf ◽

Nicholas D Youngblut

Keyword(s):

Large Scale ◽

State Of The Art ◽

Ground Truth ◽

Supplementary Information ◽

Learning Approach ◽

Wide Range ◽

Metagenome Assembly ◽

Model Training ◽

Reference Genomes

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_4 ◽

2021 ◽

pp. 47-59

Author(s):

Sacha J. van Albada ◽

Jari Pronold ◽

Alexander van Meegen ◽

Markus Diesmann

Keyword(s):

Open Source ◽

Large Scale ◽

Network Models ◽

Macaque Monkey ◽

Source Model ◽

Model Specification ◽

Data Sets ◽

Neural Network Models ◽

Wide Range ◽

Ict Infrastructure

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.

Download Full-text

The Open Global Glacier Model (OGGM) v1.1

Geoscientific Model Development ◽

10.5194/gmd-12-909-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 909-931 ◽

Cited By ~ 23

Author(s):

Fabien Maussion ◽

Anton Butenko ◽

Nicolas Champollion ◽

Matthias Dusch ◽

Julia Eis ◽

...

Keyword(s):

Mass Balance ◽

Open Source ◽

Sea Level ◽

Computational Cost ◽

Balance Model ◽

Climate Data ◽

Estimation Model ◽

Model Framework ◽

Validation Data ◽

Wide Range

Abstract. Despite their importance for sea-level rise, seasonal water availability, and as a source of geohazards, mountain glaciers are one of the few remaining subsystems of the global climate system for which no globally applicable, open source, community-driven model exists. Here we present the Open Global Glacier Model (OGGM), developed to provide a modular and open-source numerical model framework for simulating past and future change of any glacier in the world. The modeling chain comprises data downloading tools (glacier outlines, topography, climate, validation data), a preprocessing module, a mass-balance model, a distributed ice thickness estimation model, and an ice-flow model. The monthly mass balance is obtained from gridded climate data and a temperature index melt model. To our knowledge, OGGM is the first global model to explicitly simulate glacier dynamics: the model relies on the shallow-ice approximation to compute the depth-integrated flux of ice along multiple connected flow lines. In this paper, we describe and illustrate each processing step by applying the model to a selection of glaciers before running global simulations under idealized climate forcings. Even without an in-depth calibration, the model shows very realistic behavior. We are able to reproduce earlier estimates of global glacier volume by varying the ice dynamical parameters within a range of plausible values. At the same time, the increased complexity of OGGM compared to other prevalent global glacier models comes at a reasonable computational cost: several dozen glaciers can be simulated on a personal computer, whereas global simulations realized in a supercomputing environment take up to a few hours per century. Thanks to the modular framework, modules of various complexity can be added to the code base, which allows for new kinds of model intercomparison studies in a controlled environment. Future developments will add new physical processes to the model as well as automated calibration tools. Extensions or alternative parameterizations can be easily added by the community thanks to comprehensive documentation. OGGM spans a wide range of applications, from ice–climate interaction studies at millennial timescales to estimates of the contribution of glaciers to past and future sea-level change. It has the potential to become a self-sustained community-driven model for global and regional glacier evolution.

Download Full-text

Backbone—An Adaptable Energy Systems Modelling Framework

Energies ◽

10.3390/en12173388 ◽

2019 ◽

Vol 12 (17) ◽

pp. 3388 ◽

Cited By ~ 5

Author(s):

Niina Helistö ◽

Juha Kiviluoma ◽

Jussi Ikäheimo ◽

Topi Rasku ◽

Erkka Rinne ◽

...

Keyword(s):

Power Plants ◽

Large Scale ◽

Unit Commitment ◽

Energy Systems ◽

Mixed Integer ◽

Investment Planning ◽

Planning And Scheduling ◽

Modelling Framework ◽

Wide Range ◽

Systems Modelling

Backbone represents a highly adaptable energy systems modelling framework, which can be utilised to create models for studying the design and operation of energy systems, both from investment planning and scheduling perspectives. It includes a wide range of features and constraints, such as stochastic parameters, multiple reserve products, energy storage units, controlled and uncontrolled energy transfers, and, most significantly, multiple energy sectors. The formulation is based on mixed-integer programming and takes into account unit commitment decisions for power plants and other energy conversion facilities. Both high-level large-scale systems and fully detailed smaller-scale systems can be appropriately modelled. The framework has been implemented as the open-source Backbone modelling tool using General Algebraic Modeling System (GAMS). An application of the framework is demonstrated using a power system example, and Backbone is shown to produce results comparable to a commercial tool. However, the adaptability of Backbone further enables the creation and solution of energy systems models relatively easily for many different purposes and thus it improves on the available methodologies.

Download Full-text

3D Printing Complex Structures Using Modeling and Simulation

Volume 8: 27th Conference on Mechanical Vibration and Noise ◽

10.1115/detc2015-47916 ◽

2015 ◽

Author(s):

Hammad Mazhar

Keyword(s):

3D Printing ◽

Open Source ◽

Data Structures ◽

Large Scale ◽

Selective Layer ◽

Parallel Data ◽

Wide Range ◽

Data Structures And Algorithms ◽

Multi Body ◽

Dynamics Problems

This paper describes an open source parallel simulation framework capable of simulating large-scale granular and multi-body dynamics problems. This framework, called Chrono::Parallel, builds upon the modeling capabilities of Chrono::Engine, another open source simulation package, and leverages parallel data structures to enable scalable simulation of large problems. Chrono::Parallel is somewhat unique in that it was designed from the ground up to leverage parallel data structures and algorithms so that it scales across a wide range of computer architectures and yet has a rich modeling capability for simulating many different types of problems. The modeling capabilities of Chrono::Parallel will be demonstrated in the context of additive manufacturing and 3D printing by modeling the Selective Layer Sintering layering process and simulating large complex interlocking structures which require compression and folding to fit into a 3D printer’s build volume.

Download Full-text

DrawGlycan-SNFG and gpAnnotate: rendering glycans and annotating glycopeptide mass spectra

Bioinformatics ◽

10.1093/bioinformatics/btz819 ◽

2019 ◽

Cited By ~ 4

Author(s):

Kai Cheng ◽

Gabrielle Pawlowski ◽

Xinheng Yu ◽

Yusen Zhou ◽

Sriram Neelamegham

Keyword(s):

Mass Spectrometry ◽

Open Source ◽

Mass Spectra ◽

Supplementary Information ◽

Supplementary Data ◽

International Union ◽

Open Source Program ◽

Source Program ◽

Wide Range ◽

Peptide Modifications

Abstract Summary This manuscript describes an open-source program, DrawGlycan-SNFG (version 2), that accepts IUPAC (International Union of Pure and Applied Chemist)-condensed inputs to render Symbol Nomenclature For Glycans (SNFG) drawings. A wide range of local and global options enable display of various glycan/peptide modifications including bond breakages, adducts, repeat structures, ambiguous identifications etc. These facilities make DrawGlycan-SNFG ideal for integration into various glycoinformatics software, including glycomics and glycoproteomics mass spectrometry (MS) applications. As a demonstration of such usage, we incorporated DrawGlycan-SNFG into gpAnnotate, a standalone application to score and annotate individual MS/MS glycopeptide spectrum in different fragmentation modes. Availability and implementation DrawGlycan-SNFG and gpAnnotate are platform independent. While originally coded using MATLAB, compiled packages are also provided to enable DrawGlycan-SNFG implementation in Python and Java. All programs are available from https://virtualglycome.org/drawglycan; https://virtualglycome.org/gpAnnotate. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Classification of protein–protein association rates based on biophysical informatics

BMC Bioinformatics ◽

10.1186/s12859-021-04323-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kalyani Dhusia ◽

Yinghao Wu

Keyword(s):

Large Scale ◽

Cross Validation ◽

Protein Complexes ◽

Dynamic Properties ◽

Conformational Dynamics ◽

Classification Model ◽

Coarse Grained ◽

Modeling Framework ◽

Validation Data ◽

Wide Range

Abstract Background Proteins form various complexes to carry out their versatile functions in cells. The dynamic properties of protein complex formation are mainly characterized by the association rates which measures how fast these complexes can be formed. It was experimentally observed that the association rates span an extremely wide range with over ten orders of magnitudes. Identification of association rates within this spectrum for specific protein complexes is therefore essential for us to understand their functional roles. Results To tackle this problem, we integrate physics-based coarse-grained simulations into a neural-network-based classification model to estimate the range of association rates for protein complexes in a large-scale benchmark set. The cross-validation results show that, when an optimal threshold was selected, we can reach the best performance with specificity, precision, sensitivity and overall accuracy all higher than 70%. The quality of our cross-validation data has also been testified by further statistical analysis. Additionally, given an independent testing set, we can successfully predict the group of association rates for eight protein complexes out of ten. Finally, the analysis of failed cases suggests the future implementation of conformational dynamics into simulation can further improve model. Conclusions In summary, this study demonstrated that a new modeling framework that combines biophysical simulations with bioinformatics approaches is able to identify protein–protein interactions with low association rates from those with higher association rates. This method thereby can serve as a useful addition to a collection of existing experimental approaches that measure biomolecular recognition.

Download Full-text