Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions

Kathryn Geiger-Schuller; Kevin Sforza; Max Yuhas; Fabio Parmeggiani; David Baker; Doug Barrick

doi:10.1073/pnas.1800283115

Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800283115 ◽

2018 ◽

Vol 115 (29) ◽

pp. 7539-7544 ◽

Cited By ~ 12

Author(s):

Kathryn Geiger-Schuller ◽

Kevin Sforza ◽

Max Yuhas ◽

Fabio Parmeggiani ◽

David Baker ◽

...

Keyword(s):

Protein Design ◽

Nearest Neighbor ◽

De Novo ◽

Protein Structures ◽

Free Energies ◽

Intrinsic Stability ◽

Repeat Proteins ◽

Naturally Occurring ◽

Wide Range ◽

The Individual

Designed helical repeats (DHRs) are modular helix–loop–helix–loop protein structures that are tandemly repeated to form a superhelical array. Structures combining tandem DHRs demonstrate a wide range of molecular geometries, many of which are not observed in nature. Understanding cooperativity of DHR proteins provides insight into the molecular origins of Rosetta-based protein design hyperstability and facilitates comparison of energy distributions in artificial and naturally occurring protein folds. Here, we use a nearest-neighbor Ising model to quantify the intrinsic and interfacial free energies of four different DHRs. We measure the folding free energies of constructs with varying numbers of internal and terminal capping repeats for four different DHR folds, using guanidine-HCl and glycerol as destabilizing and solubilizing cosolvents. One-dimensional Ising analysis of these series reveals that, although interrepeat coupling energies are within the range seen for naturally occurring repeat proteins, the individual repeats of DHR proteins are intrinsically stable. This favorable intrinsic stability, which has not been observed for naturally occurring repeat proteins, adds to stabilizing interfaces, resulting in extraordinarily high stability. Stable repeats also impart a downhill shape to the energy landscape for DHR folding. These intrinsic stability differences suggest that part of the success of Rosetta-based design results from capturing favorable local interactions.

Get full-text (via PubEx)

Computational protein design with backbone plasticity

Biochemical Society Transactions ◽

10.1042/bst20160155 ◽

2016 ◽

Vol 44 (5) ◽

pp. 1523-1529 ◽

Cited By ~ 13

Author(s):

James T. MacDonald ◽

Paul S. Freemont

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Search Space ◽

Computational Protein Design ◽

Artificial Enzymes ◽

Backbone Flexibility ◽

Artificial Proteins ◽

Naturally Occurring ◽

Backbone Structure

The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as ‘scaffolds’ onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increased search space, but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process.

Get full-text (via PubEx)

Design of complicated all-α protein structures

10.1101/2021.07.14.449347 ◽

2021 ◽

Author(s):

Koya Sakuma ◽

Naohiro Kobayashi ◽

Toshihiko Sugiki ◽

Toshio Nagashima ◽

Toshimichi Fujiwara ◽

...

Keyword(s):

De Novo ◽

Protein Structures ◽

Building Blocks ◽

Helical Structures ◽

Helix Loop Helix ◽

Naturally Occurring ◽

Wide Range ◽

Helical Protein ◽

Helical Proteins ◽

The Universe

A wide range of de novo protein structure designs have been achieved, but the complexity of naturally occurring protein structures is still far beyond these designs. To expand the diversity and complexity of de novo designed protein structures, we sought to develop a method for designing 'difficult-to-describe' α-helical protein structures composed of irregularly aligned α-helices, such as globins. Backbone structure libraries consisting of a myriad of α-helical structures with 5- or 6- helices were generated by combining 18 helix-loop-helix motifs and canonical α-helices, and five distinct topologies were selected for de novo design. The designs were found to be monomeric with high thermal stability in solution and fold into the target topologies with atomic accuracy. This study demonstrated that complicated α-helical proteins are created using typical building blocks. The method we developed would enable us to explore the universe of protein structures for designing novel functional proteins.

Get full-text (via PubEx)

A bottom-up approach for the de novo design of functional proteins

10.1101/2020.03.11.988071 ◽

2020 ◽

Cited By ~ 2

Author(s):

Che Yang ◽

Fabian Sesterhenn ◽

Jaume Bonet ◽

Eva van Aalen ◽

Leo Scheller ◽

...

Keyword(s):

Protein Design ◽

Mammalian Cells ◽

De Novo ◽

Protein Structures ◽

Functional Protein ◽

Regular Secondary Structure ◽

Bottom Up ◽

Binding Motifs ◽

Wide Range ◽

Novel Protein

AbstractDe novo protein design has enabled the creation of novel protein structures. To design novel functional proteins, state-of-the-art approaches use natural proteins or first design protein scaffolds that subsequently serve as templates for the transplantation of functional motifs. In these approaches, the templates are function-agnostic and motifs have been limited to those with regular secondary structure. Here, we present a bottom-up approach to build de novo proteins tailored to structurally complex functional motifs. We applied a bottom-up strategy to design scaffolds for four different binding motifs, including one bi-functionalized protein with two motifs. The de novo proteins were functional as biosensors to quantify epitope-specific antibody responses and as orthogonal ligands to activate a signaling pathway in engineered mammalian cells. Altogether, we present a versatile strategy for the bottom-up design of functional proteins, applicable to a wide range of functional protein design challenges.

Get full-text (via PubEx)

De novo protein design by deep network hallucination

10.1101/2020.07.22.211482 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ivan Anishchenko ◽

Tamuka M. Chidyausiku ◽

Sergey Ovchinnikov ◽

Samuel J. Pellock ◽

David Baker

Keyword(s):

Amino Acid ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Protein Structures ◽

Monte Carlo Sampling ◽

Amino Acid Sequences ◽

Wide Range ◽

Physically Based ◽

Folded Proteins

AbstractThere has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1–3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.

Get full-text (via PubEx)

The register shift rules for βαβ-motifs for de novo protein design

PLoS ONE ◽

10.1371/journal.pone.0256895 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0256895

Author(s):

Hiroto Murata ◽

Hayao Imakawa ◽

Nobuyasu Koga ◽

George Chikenji

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Sampling Bias ◽

Conformational Sampling ◽

Design Rules ◽

Loop Region ◽

Wide Range ◽

Physical Interactions ◽

A wide range of de novo design of αβ-proteins has been achieved based on the design rules, which describe secondary structure lengths and loop torsion patterns favorable for design target topologies. This paper proposes design rules for register shifts in βαβ-motifs, which have not been reported previously, but are necessary for determining a target structure of de novo design of αβ-proteins. By analyzing naturally occurring protein structures in a database, we found preferences for register shifts in βαβ-motifs, and derived the following empirical rules: (1) register shifts must not be negative regardless of torsion types for a constituent loop in βαβ-motifs; (2) preferred register shifts strongly depend on the loop torsion types. To explain these empirical rules by physical interactions, we conducted physics-based simulations for systems mimicking a βαβ-motif that contains the most frequently observed loop type in the database. We performed an exhaustive conformational sampling of the loop region, imposing the exclusion volume and hydrogen bond satisfaction condition. The distributions of register shifts obtained from the simulations agreed well with those of the database analysis, indicating that the empirical rules are a consequence of physical interactions, rather than an evolutionary sampling bias. Our proposed design rules will serve as a guide to making appropriate target structures for the de novo design of αβ-proteins.

Get full-text (via PubEx)

Design of proteins presenting discontinuous functional sites using deep learning

10.1101/2020.11.29.402743 ◽

2020 ◽

Author(s):

Doug Tischer ◽

Sidney Lisanza ◽

Jue Wang ◽

Runze Dong ◽

Ivan Anishchenko ◽

...

Keyword(s):

Loss Function ◽

Protein Design ◽

De Novo ◽

Protein Structures ◽

Functional Site ◽

Structural Motif ◽

Functional Sites ◽

Binding Interface ◽

Wide Range ◽

Sampling Problem

AbstractAn outstanding challenge in protein design is the design of binders against therapeutically relevant target proteins via scaffolding the discontinuous binding interfaces present in their often large and complex binding partners. There is currently no method for sampling through the almost unlimited number of possible protein structures for those capable of scaffolding a specified discontinuous functional site; instead, current approaches make the sampling problem tractable by restricting search to structures composed of pre-defined secondary structural elements. Such restriction of search has the disadvantage that considerable trial and error can be required to identify architectures capable of scaffolding an arbitrary discontinuous functional site, and only a tiny fraction of possible architectures can be explored. Here we build on recent advances in de novo protein design by deep network hallucination to develop a solution to this problem which eliminates the need to pre-specify the structure of the scaffolding in any way. We use the trRosetta residual neural network, which maps input sequences to predicted inter-residue distances and orientations, to compute a loss function which simultaneously rewards recapitulation of a desired structural motif and the ideality of the surrounding scaffold, and generate diverse structures harboring the desired binding interface by optimizing this loss function by gradient descent. We illustrate the power and versatility of the method by scaffolding binding sites from proteins involved in key signaling pathways with a wide range of secondary structure compositions and geometries. The method should be broadly useful for designing small stable proteins containing complex functional sites.

Get full-text (via PubEx)

Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks

10.1101/2020.07.27.221333 ◽

2020 ◽

Author(s):

Yang Hsia ◽

Rubul Mout ◽

William Sheffler ◽

Natasha I. Edman ◽

Ivan Vulovic ◽

...

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Complexes ◽

Building Blocks ◽

Hierarchical Design ◽

X Ray ◽

Helical Bundle ◽

X Ray Crystallography ◽

Repeat Proteins ◽

Wide Range

AbstractA goal of de novo protein design is to develop a systematic and robust approach to generating complex nanomaterials from stable building blocks. Due to their structural regularity and simplicity, a wide range of monomeric repeat proteins and oligomeric helical bundle structures have been designed and characterized. Here we describe a stepwise hierarchical approach to building up multi-component symmetric protein assemblies using these structures. We first connect designed helical repeat proteins (DHRs) to designed helical bundle proteins (HBs) to generate a large library of heterodimeric and homooligomeric building blocks; the latter have cyclic symmetries ranging from C2 to C6. All of the building blocks have repeat proteins with accessible termini, which we take advantage of in a second round of architecture guided rigid helical fusion (WORMS) to generate larger symmetric assemblies including C3 and C5 cyclic and D2 dihedral rings, a tetrahedral cage, and a 120 subunit icosahedral cage. Characterization of the structures by small angle x-ray scattering, x-ray crystallography, and cryo-electron microscopy demonstrates that the hierarchical design approach can accurately and robustly generate a wide range of macromolecular assemblies; with a diameter of 43nm, the icosahedral nanocage is the largest structurally validated designed cage to date. The computational methods and building block sets described here provide a very general route to new de novo designed symmetric protein nanomaterials.

Get full-text (via PubEx)

Protein designer David Baker: I like doing things that seem like magic

National Science Review ◽

10.1093/nsr/nwaa071 ◽

2020 ◽

Vol 7 (8) ◽

pp. 1410-1412

Author(s):

Weijie Zhao ◽

Chu Wang

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Computational Prediction ◽

Biological Functions ◽

Personal Experiences ◽

De Novo Protein Design ◽

And Function ◽

The University ◽

Opening Up

Abstract Search ‘de novo protein design’ on Google and you will find the name David Baker in all results of the first page. Professor David Baker at the University of Washington and other scientists are opening up a new world of fantastic proteins. Protein is the direct executor of most biological functions and its structure and function are fully determined by its primary sequence. Baker's group developed the Rosetta software suite that enabled the computational prediction and design of protein structures. Being able to design proteins from scratch means being able to design executors for diverse purposes and benefit society in multiple ways. Recently, NSR interviewed Prof. Baker on this fast-developing field and his personal experiences.

Get full-text (via PubEx)

De novo protein design: how do we expand into the universe of possible protein structures?

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2015.05.009 ◽

2015 ◽

Vol 33 ◽

pp. 16-26 ◽

Cited By ~ 110

Author(s):

Derek N Woolfson ◽

Gail J Bartlett ◽

Antony J Burton ◽

Jack W Heal ◽

Ai Niitsu ◽

...

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

De Novo Protein Design ◽

The Universe

Get full-text (via PubEx)

Multi-Scale Structural Analysis of Proteins by Deep Semantic Segmentation

10.1101/474627 ◽

2018 ◽

Author(s):

Raphael R. Eguchi ◽

Po-Ssu Huang

Keyword(s):

Image Classification ◽

Protein Design ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Semantic Segmentation ◽

Amino Acid Sequences ◽

Structural Quality ◽

Small Subset ◽

Structural Prediction

AbstractRecent advancements in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds, and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation — a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structural quality assessment. We represent protein structures as 2D α-carbon distance matrices (“contact maps”), and train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model performs exceptionally well, achieving a per-residue accuracy of 90.8% on the test set (95.0% average accuracy over all classes; 87.8% average within-structure accuracy). The unique aspect of our classifier is that it encodes sequence agnostic residue environments from the PDB and can assess structural quality as quantitative probabilities. We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design.SignificanceRecent computational advances have allowed researchers to predict the structure of many proteins from their amino acid sequences, as well as designing new sequences that fold into predefined structures. However, these tasks are often challenging because they require selection of a small subset of promising structural models from a large pool of stochastically generated ones. Here, we describe a novel approach to protein model selection that uses 2D image classification techniques to evaluate 3D protein models. Our method can be used to select structures based on the fold that they adopt, and can also be used to identify regions of low structural quality. These capabilities yield a powerful tool for both protein design and structure prediction.

Get full-text (via PubEx)