ssbio: a Python framework for structural systems biology

Nathan Mih; Elizabeth Brunk; Ke Chen; Edward Catoiu; Anand Sastry; Erol Kavvas; Jonathan M Monk; Zhen Zhang; Bernhard O Palsson

doi:10.1093/bioinformatics/bty077

ssbio: a Python framework for structural systems biology

Bioinformatics ◽

10.1093/bioinformatics/bty077 ◽

2018 ◽

Vol 34 (12) ◽

pp. 2155-2157 ◽

Cited By ~ 15

Author(s):

Nathan Mih ◽

Elizabeth Brunk ◽

Ke Chen ◽

Edward Catoiu ◽

Anand Sastry ◽

...

Keyword(s):

Structural Information ◽

Protein Structures ◽

Structural Data ◽

Third Party ◽

Supplementary Information ◽

Scale Models ◽

Protein Properties ◽

Scale Network ◽

Structural Systems Biology ◽

Genome Scale

Abstract Summary Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows. Availability and implementation ssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/master?filepath=Binder.ipynb. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ssbio: A Python Framework for Structural Systems Biology

10.1101/165506 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nathan Mih ◽

Elizabeth Brunk ◽

Ke Chen ◽

Edward Catoiu ◽

Anand Sastry ◽

...

Keyword(s):

Structural Information ◽

Protein Structures ◽

Structural Data ◽

Third Party ◽

Supplementary Information ◽

Link Type ◽

Protein Properties ◽

Scale Network ◽

Structural Systems Biology ◽

Genome Scale

AbstractSummaryWorking with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows.Availability and Implementationssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

Structural Systems Biology Evaluation of Metabolic Thermotolerance in Escherichia coli

Science ◽

10.1126/science.1234012 ◽

2013 ◽

Vol 340 (6137) ◽

pp. 1220-1223 ◽

Cited By ~ 80

Author(s):

Roger L. Chang ◽

Kathleen Andrews ◽

Donghyuk Kim ◽

Zhanwen Li ◽

Adam Godzik ◽

...

Keyword(s):

Escherichia Coli ◽

Systems Biology ◽

Structural Information ◽

Protein Structures ◽

Limiting Factors ◽

Scale Model ◽

Structural Systems ◽

A Genome ◽

Structural Systems Biology ◽

Genome Scale

Genome-scale network reconstruction has enabled predictive modeling of metabolism for many systems. Traditionally, protein structural information has not been represented in such reconstructions. Expansion of a genome-scale model of Escherichia coli metabolism by including experimental and predicted protein structures enabled the analysis of protein thermostability in a network context. This analysis allowed the prediction of protein activities that limit network function at superoptimal temperatures and mechanistic interpretations of mutations found in strains adapted to heat. Predicted growth-limiting factors for thermotolerance were validated through nutrient supplementation experiments and defined metabolic sensitivities to heat stress, providing evidence that metabolic enzyme thermostability is rate-limiting at superoptimal temperatures. Inclusion of structural information expanded the content and predictive capability of genome-scale metabolic networks that enable structural systems biology of metabolism.

Download Full-text

Streamlined use of protein structures in variant analysis

10.1101/2021.09.10.459756 ◽

2021 ◽

Author(s):

Sandeep Kaur ◽

Neblina Sikta ◽

Andrea Schafferhans ◽

Nicola Bordin ◽

Mark J. Cowley ◽

...

Keyword(s):

Protein Function ◽

Molecular Mechanisms ◽

Structural Information ◽

Protein Structures ◽

Structural Data ◽

Supplementary Information ◽

3D Structures ◽

Link Type ◽

Variant Analysis ◽

Many Sources

AbstractMotivationVariant analysis is a core task in bioinformatics that requires integrating data from many sources. This process can be helped by using 3D structures of proteins, which can provide a spatial context that can provide insight into how variants affect function. Many available tools can help with mapping variants onto structures; but each has specific restrictions, with the result that many researchers fail to benefit from valuable insights that could be gained from structural data.ResultsTo address this, we have created a streamlined system for incorporating 3D structures into variant analysis. Variants can be easily specified via URLs that are easily readable and writable, and use the notation recommended by the Human Genome Variation Society (HGVS). For example, ‘https://aquaria.app/SARS-CoV-2/S/?N501Y’ specifies the N501Y variant of SARS-CoV-2 S protein. In addition to mapping variants onto structures, our system provides summary information from multiple external resources, including COSMIC, CATH-FunVar, and PredictProtein. Furthermore, our system identifies and summarizes structures containing the variant, as well as the variant-position. Our system supports essentially any mutation for any well-studied protein, and uses all available structural data — including models inferred via very remote homology — integrated into a system that is fast and simple to use. By giving researchers easy, streamlined access to a wealth of structural information during variant analysis, our system will help in revealing novel insights into the molecular mechanisms underlying protein function in health and disease.AvailabilityOur resource is freely available at the project home page (https://aquaria.app). After peer review, the code will be openly available via a GPL version 2 license at https://github.com/ODonoghueLab/Aquaria. PSSH2, the database of sequence-to-structure alignments, is also freely available for download at https://zenodo.org/record/[email protected] informationNone.

Download Full-text

Protein Structure Determination in Living Cells

International Journal of Molecular Sciences ◽

10.3390/ijms20102442 ◽

2019 ◽

Vol 20 (10) ◽

pp. 2442 ◽

Cited By ~ 2

Author(s):

Teppei Ikeya ◽

Peter Güntert ◽

Yutaka Ito

Keyword(s):

Protein Structure ◽

Structure Determination ◽

Structure Prediction ◽

Structural Information ◽

Nuclear Overhauser Effect ◽

Protein Structures ◽

Three Dimensional ◽

Structural Data ◽

Sample Tube ◽

In Cells

To date, in-cell NMR has elucidated various aspects of protein behaviour by associating structures in physiological conditions. Meanwhile, current studies of this method mostly have deduced protein states in cells exclusively based on ‘indirect’ structural information from peak patterns and chemical shift changes but not ‘direct’ data explicitly including interatomic distances and angles. To fully understand the functions and physical properties of proteins inside cells, it is indispensable to obtain explicit structural data or determine three-dimensional (3D) structures of proteins in cells. Whilst the short lifetime of cells in a sample tube, low sample concentrations, and massive background signals make it difficult to observe NMR signals from proteins inside cells, several methodological advances help to overcome the problems. Paramagnetic effects have an outstanding potential for in-cell structural analysis. The combination of a limited amount of experimental in-cell data with software for ab initio protein structure prediction opens an avenue to visualise 3D protein structures inside cells. Conventional nuclear Overhauser effect spectroscopy (NOESY)-based structure determination is advantageous to elucidate the conformations of side-chain atoms of proteins as well as global structures. In this article, we review current progress for the structure analysis of proteins in living systems and discuss the feasibility of its future works.

Download Full-text

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations

Bioinformatics ◽

10.1093/bioinformatics/btz482 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4854-4856 ◽

Cited By ~ 8

Author(s):

James D Stephenson ◽

Roman A Laskowski ◽

Andrew Nightingale ◽

Matthew E Hurles ◽

Janet M Thornton

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Protein Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Web Tool ◽

Genomic Variants ◽

Structural Context ◽

Pathogenic Variants ◽

Transcript Evidence

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SphereCon—a method for precise estimation of residue relative solvent accessible area from limited structural information

Bioinformatics ◽

10.1093/bioinformatics/btaa159 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3372-3378

Author(s):

Alexander Gress ◽

Olga V Kalinina

Keyword(s):

Protein Function ◽

Structural Information ◽

Solvent Accessibility ◽

Three Dimensional ◽

Structural Data ◽

Supplementary Information ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Precise Measure ◽

The Impact

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Expanding the uses of genome‐scale models with protein structures

Molecular Systems Biology ◽

10.15252/msb.20188601 ◽

2019 ◽

Vol 15 (11) ◽

Cited By ~ 1

Author(s):

Nathan Mih ◽

Bernhard O Palsson

Keyword(s):

Protein Structures ◽

Scale Models ◽

Genome Scale

Download Full-text

FALCONET: an R package to accelerate automatic visualisation of genome scale metabolic models

10.1101/662056 ◽

2019 ◽

Author(s):

Hongzhong Lu ◽

Zhengming Zhu ◽

Eduard J Kerkhoven ◽

Jens Nielsen

Keyword(s):

Metabolic Networks ◽

R Package ◽

Network Size ◽

Research Community ◽

Supplementary Information ◽

Large Network ◽

Strain Design ◽

Scale Models ◽

Genome Scale ◽

Integrative Omics

AbstractSummaryFALCONET (FAst visuaLisation of COmputational NETworks) enables the automatic for-mation and visualisation of metabolic maps from genome-scale models with R and CellDesigner, readily facilitating the visualisation of multi-layers omics datasets in the context of metabolic networks.MotivationUntil now, numerous GEMs have been reconstructed and used as scaffolds to conduct integrative omics analysis and in silico strain design. Due to the large network size of GEMs, it is challenging to produce and visualize these networks as metabolic maps for further in-depth analyses.ResultsHere, we presented the R package - FALCONET, which facilitates drawing and visualizing metabolic maps in an automatic manner. This package will benefit the research community by allowing a wider use of GEMs in systems biology.Availability and implementationFALCONET is available on https://github.com/SysBioChalmers/FALCONET and released under the MIT [email protected] informationSupplementary data are available online.

Download Full-text

Analysis of several key factors influencing deep learning-based inter-residue contact prediction

Bioinformatics ◽

10.1093/bioinformatics/btz679 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tianqi Wu ◽

Jie Hou ◽

Badri Adhikari ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Structural Information ◽

Protein Structures ◽

Supplementary Information ◽

Prediction Methods ◽

Key Factors ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Ab Initio Approach

Abstract Motivation Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated. Results We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction. Availability and implementation https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Dynamical important residue network (DIRN): network inference via conformational change

Bioinformatics ◽

10.1093/bioinformatics/btz298 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4664-4670 ◽

Cited By ~ 2

Author(s):

Quan Li ◽

Ray Luo ◽

Hai-Feng Chen

Keyword(s):

Network Inference ◽

Protein Structures ◽

Interaction Network ◽

Structural Data ◽

Supplementary Information ◽

Residue Interaction ◽

Protein Functions ◽

Important Residue ◽

Dynamics Simulations ◽

Dynamical Information

Abstract Motivation Protein residue interaction network has emerged as a useful strategy to understand the complex relationship between protein structures and functions and how functions are regulated. In a residue interaction network, every residue is used to define a network node, adding noises in network post-analysis and increasing computational burden. In addition, dynamical information is often necessary in deciphering biological functions. Results We developed a robust and efficient protein residue interaction network method, termed dynamical important residue network, by combining both structural and dynamical information. A major departure from previous approaches is our attempt to identify important residues most important for functional regulation before a network is constructed, leading to a much simpler network with the important residues as its nodes. The important residues are identified by monitoring structural data from ensemble molecular dynamics simulations of proteins in different functional states. Our tests show that the new method performs well with overall higher sensitivity than existing approaches in identifying important residues and interactions in tested proteins, so it can be used in studies of protein functions to provide useful hypotheses in identifying key residues and interactions. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text