Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation

Designing entirely new protein structures remains challenging because we do not fully understand the biophysical determinants of folding stability. Yet some protein folds are easier to design than others. Previous work identified the 43-residue αββ&#945 fold as especially challenging: the best designs had only a 2% success rate, compared to 39-87% success for other simple folds (1). This suggested the αββ&#945 fold would be a useful model system for gaining a deeper understanding of folding stability determinants and for testing new protein design methods. Here, we designed over ten thousand new αββ&#945 proteins and found over three thousand of them to fold into stable structures using a high-throughput protease-based assay. Nuclear magnetic resonance, hydrogen-deuterium exchange, circular dichroism, deep mutational scanning, and scrambled sequence control experiments indicated that our stable designs fold into their designed αββ&#945 structures with exceptional stability for their small size. Our large dataset enabled us to quantify the influence of universal stability determinants including nonpolar burial, helix capping, and buried unsatisfied polar atoms, as well as stability determinants unique to the αββ&#945 topology. Our work demonstrates how large-scale design and test cycles can solve challenging design problems while illuminating the biophysical determinants of folding.

Download Full-text

Multi-Scale Structural Analysis of Proteins by Deep Semantic Segmentation

10.1101/474627 ◽

2018 ◽

Author(s):

Raphael R. Eguchi ◽

Po-Ssu Huang

Keyword(s):

Image Classification ◽

Protein Design ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Semantic Segmentation ◽

Amino Acid Sequences ◽

Structural Quality ◽

Small Subset ◽

Structural Prediction

AbstractRecent advancements in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds, and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation — a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structural quality assessment. We represent protein structures as 2D α-carbon distance matrices (“contact maps”), and train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model performs exceptionally well, achieving a per-residue accuracy of 90.8% on the test set (95.0% average accuracy over all classes; 87.8% average within-structure accuracy). The unique aspect of our classifier is that it encodes sequence agnostic residue environments from the PDB and can assess structural quality as quantitative probabilities. We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design.SignificanceRecent computational advances have allowed researchers to predict the structure of many proteins from their amino acid sequences, as well as designing new sequences that fold into predefined structures. However, these tasks are often challenging because they require selection of a small subset of promising structural models from a large pool of stochastically generated ones. Here, we describe a novel approach to protein model selection that uses 2D image classification techniques to evaluate 3D protein models. Our method can be used to select structures based on the fold that they adopt, and can also be used to identify regions of low structural quality. These capabilities yield a powerful tool for both protein design and structure prediction.

Download Full-text

Multi-scale structural analysis of proteins by deep semantic segmentation

Bioinformatics ◽

10.1093/bioinformatics/btz650 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1740-1749 ◽

Cited By ~ 2

Author(s):

Raphael R Eguchi ◽

Po-Ssu Huang

Keyword(s):

Protein Design ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Semantic Segmentation ◽

Structural Features ◽

Supplementary Information ◽

Structural Prediction ◽

Structure Accuracy ◽

High Level

Abstract Motivation Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation—a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. Results We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design. Availability and implementation The trained classifier network, parser network, and entropy calculation scripts are available for download at https://git.io/fp6bd, with detailed usage instructions provided at the download page. A step-by-step tutorial for setup is provided at https://goo.gl/e8GB2S. All Rosetta commands, RosettaRemodel blueprints, and predictions for all datasets used in the study are available in the Supplementary Information. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Rational thermostabilisation of four-helix bundle dimeric de novo proteins

Scientific Reports ◽

10.1038/s41598-021-86952-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shin Irumagawa ◽

Kaito Kobayashi ◽

Yutaka Saito ◽

Takeshi Miyata ◽

Mitsuo Umetsu ◽

...

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Complexes ◽

Salt Bridge ◽

Dynamics Simulation ◽

Saturation Mutagenesis ◽

Helix Bundle ◽

Stable Mutant ◽

Four Helix Bundle ◽

The Stability

AbstractThe stability of proteins is an important factor for industrial and medical applications. Improving protein stability is one of the main subjects in protein engineering. In a previous study, we improved the stability of a four-helix bundle dimeric de novo protein (WA20) by five mutations. The stabilised mutant (H26L/G28S/N34L/V71L/E78L, SUWA) showed an extremely high denaturation midpoint temperature (Tm). Although SUWA is a remarkably hyperstable protein, in protein design and engineering, it is an attractive challenge to rationally explore more stable mutants. In this study, we predicted stabilising mutations of WA20 by in silico saturation mutagenesis and molecular dynamics simulation, and experimentally confirmed three stabilising mutations of WA20 (N22A, N22E, and H86K). The stability of a double mutant (N22A/H86K, rationally optimised WA20, ROWA) was greatly improved compared with WA20 (ΔTm = 10.6 °C). The model structures suggested that N22A enhances the stability of the α-helices and N22E and H86K contribute to salt-bridge formation for protein stabilisation. These mutations were also added to SUWA and improved its Tm. Remarkably, the most stable mutant of SUWA (N22E/H86K, rationally optimised SUWA, ROSA) showed the highest Tm (129.0 °C). These new thermostable mutants will be useful as a component of protein nanobuilding blocks to construct supramolecular protein complexes.

Download Full-text

Protein designer David Baker: I like doing things that seem like magic

National Science Review ◽

10.1093/nsr/nwaa071 ◽

2020 ◽

Vol 7 (8) ◽

pp. 1410-1412

Author(s):

Weijie Zhao ◽

Chu Wang

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Computational Prediction ◽

Biological Functions ◽

Personal Experiences ◽

De Novo Protein Design ◽

And Function ◽

The University ◽

Opening Up

Abstract Search ‘de novo protein design’ on Google and you will find the name David Baker in all results of the first page. Professor David Baker at the University of Washington and other scientists are opening up a new world of fantastic proteins. Protein is the direct executor of most biological functions and its structure and function are fully determined by its primary sequence. Baker's group developed the Rosetta software suite that enabled the computational prediction and design of protein structures. Being able to design proteins from scratch means being able to design executors for diverse purposes and benefit society in multiple ways. Recently, NSR interviewed Prof. Baker on this fast-developing field and his personal experiences.

Download Full-text

2105 Optimizing the Decision Making Process for Large-Scale Design Problems According to the Interrelationships among Criteria

The Proceedings of Design & Systems Conference ◽

10.1299/jsmedsd.2001.11.94 ◽

2001 ◽

Vol 2001.11 (0) ◽

pp. 94-97

Author(s):

Yoshihisa FUJIMI ◽

Masataka YOSHIMURA ◽

Kazuhiro IZUI

Keyword(s):

Decision Making ◽

Large Scale ◽

Decision Making Process ◽

Design Problems ◽

Scale Design

Download Full-text

De novo protein design: how do we expand into the universe of possible protein structures?

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2015.05.009 ◽

2015 ◽

Vol 33 ◽

pp. 16-26 ◽

Cited By ~ 110

Author(s):

Derek N Woolfson ◽

Gail J Bartlett ◽

Antony J Burton ◽

Jack W Heal ◽

Ai Niitsu ◽

...

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

De Novo Protein Design ◽

The Universe

Download Full-text

Computational protein design with backbone plasticity

Biochemical Society Transactions ◽

10.1042/bst20160155 ◽

2016 ◽

Vol 44 (5) ◽

pp. 1523-1529 ◽

Cited By ~ 13

Author(s):

James T. MacDonald ◽

Paul S. Freemont

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Search Space ◽

Computational Protein Design ◽

Artificial Enzymes ◽

Backbone Flexibility ◽

Artificial Proteins ◽

Naturally Occurring ◽

Backbone Structure

The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as ‘scaffolds’ onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increased search space, but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process.

Download Full-text

BRANEart: identify stability strength and weakness regions in membrane proteins

10.1101/2021.08.22.457277 ◽

2021 ◽

Author(s):

Sankar Basu ◽

Simon S. Assaf ◽

Fabian Teheux ◽

Marianne Rooman ◽

Fabrizio Pucci

Keyword(s):

Membrane Proteins ◽

Membrane Protein ◽

Conformational Changes ◽

Large Scale ◽

Protein Structures ◽

Accurate Method ◽

Globular Proteins ◽

Stability Properties ◽

Overall Stability ◽

The Stability

AbstractUnderstanding the role of stability strengths and weaknesses in proteins is a key objective for rationalizing their dynamical and functional properties such as conformational changes, catalytic activity, and protein-protein and protein-ligand interactions. We present BRANEart, a new, fast and accurate method to evaluate the per-residue contributions to the overall stability of membrane proteins. It is based on an extended set of recently introduced statistical potentials derived from membrane protein structures, which better describe the stability properties of this class of proteins than standard potentials derived from globular proteins. We defined a per-residue membrane propensity index from combinations of these potentials, which can be used to identify residues which strongly contribute to the stability of the transmembrane region or which would, on the contrary, be more stable in extramembrane regions, or vice versa. Large-scale application to membrane and globular proteins sets and application to tests cases show excellent agreement with experimental data. BRANEart thus appears as a useful instrument to analyze in detail the overall stability properties of a target membrane protein, to position it relative to the lipid bilayer, and to rationally modify its biophysical characteristics and function. BRANEart can be freely accessed from http://babylone.3bio.ulb.ac.be/BRANEart.

Download Full-text

Evolutionary-Based Hybrid Optimizer Applicable to Large-Scale Design Problems

Journal of Computational Science and Technology ◽

10.1299/jcst.7.28 ◽

2013 ◽

Vol 7 (1) ◽

pp. 28-37 ◽

Cited By ~ 1

Author(s):

Kazuhisa CHIBA

Keyword(s):

Large Scale ◽

Design Problems ◽

Scale Design

Download Full-text

Computational design of closely related proteins that adopt two well-defined but structurally divergent folds

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1914808117 ◽

2020 ◽

Vol 117 (13) ◽

pp. 7208-7215 ◽

Cited By ~ 5

Author(s):

Kathy Y. Wei ◽

Danai Moschidi ◽

Matthew J. Bick ◽

Santrupti Nerli ◽

Andrew C. McShan ◽

...

Keyword(s):

Large Scale ◽

De Novo ◽

Conformational Transition ◽

Protein Structures ◽

Computational Design ◽

Spectroscopic Characterization ◽

De Novo Design ◽

Viral Fusion ◽

Naturally Occurring ◽

Related Proteins

The plasticity of naturally occurring protein structures, which can change shape considerably in response to changes in environmental conditions, is critical to biological function. While computational methods have been used for de novo design of proteins that fold to a single state with a deep free-energy minimum [P.-S. Huang, S. E. Boyken, D. Baker, Nature 537, 320–327 (2016)], and to reengineer natural proteins to alter their dynamics [J. A. Davey, A. M. Damry, N. K. Goto, R. A. Chica, Nat. Chem. Biol. 13, 1280–1285 (2017)] or fold [P. A. Alexander, Y. He, Y. Chen, J. Orban, P. N. Bryan, Proc. Natl. Acad. Sci. U.S.A. 106, 21149–21154 (2009)], the de novo design of closely related sequences which adopt well-defined but structurally divergent structures remains an outstanding challenge. We designed closely related sequences (over 94% identity) that can adopt two very different homotrimeric helical bundle conformations—one short (∼66 Å height) and the other long (∼100 Å height)—reminiscent of the conformational transition of viral fusion proteins. Crystallographic and NMR spectroscopic characterization shows that both the short- and long-state sequences fold as designed. We sought to design bistable sequences for which both states are accessible, and obtained a single designed protein sequence that populates either the short state or the long state depending on the measurement conditions. The design of sequences which are poised to adopt two very different conformations sets the stage for creating large-scale conformational switches between structurally divergent forms.

Download Full-text