scholarly journals Function-guided protein design by deep manifold sampling

2021 ◽  
Author(s):  
Vladimir Gligorijevic ◽  
Daniel Berenberg ◽  
Stephen Ra ◽  
Andrew Watkins ◽  
Simon Kelow ◽  
...  

Protein design is challenging because it requires searching through a vast combinatorial space that is only sparsely functional. Self-supervised learning approaches offer the potential to navigate through this space more effectively and thereby accelerate protein engineering. We introduce a sequence denoising autoencoder (DAE) that learns the manifold of protein sequences from a large amount of potentially unlabelled proteins. This DAE is combined with a function predictor that guides sampling towards sequences with higher levels of desired functions. We train the sequence DAE on more than 20M unlabeled protein sequences spanning many evolutionarily diverse protein families and train the function predictor on approximately 0.5M sequences with known function labels. At test time, we sample from the model by iteratively denoising a sequence while exploiting the gradients from the function predictor. We present a few preliminary case studies of protein design that demonstrate the effectiveness of this proposed approach, which we refer to as "deep manifold sampling", including metal binding site addition, function-preserving diversification, and global fold change.

2017 ◽  
Author(s):  
Tian Jiang ◽  
P. Douglas Renfrew ◽  
Kevin Drew ◽  
Noah Youngs ◽  
Glenn Butterfoss ◽  
...  

AbstractA wide variety of protein and peptidomimetic design tasks require matching functional three-dimensional motifs to potential oligomeric scaffolds. Enzyme design, for example, aims to graft active-site patterns typically consisting of 3 to 15 residues onto new protein surfaces. Identifying suitable proteins capable of scaffolding such active-site engraftment requires costly searches to identify protein folds that can provide the correct positioning of side chains to host the desired active site. Other examples of biodesign tasks that require simpler fast exact geometric searches of potential side chain positioning include mimicking binding hotspots, design of metal binding clusters and the design of modular hydrogen binding networks for specificity. In these applications the speed and scaling of geometric search limits downstream design to small patterns. Here we present an adaptive algorithm to searching for side chain take-off angles compatible with an arbitrarily specified functional pattern that enjoys substantive performance improvements over previous methods. We demonstrate this method in both genetically encoded (protein) and synthetic (peptidomimetic) design scenarios. Examples of using this method with the Rosetta framework for protein design are provided but our implementation is compatible with multiple protein design frameworks and is freely available as a set of python scripts (https://github.com/JiangTian/adaptive-geometric-search-for-protein-design).


2013 ◽  
Author(s):  
Eleisha L. Jackson ◽  
Noah Ollikainen ◽  
Arthur W. Covert III ◽  
Tanja Kortemme ◽  
Claus O. Wilke

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.


2021 ◽  
Author(s):  
Jose A. Rodriguez-Rodriguez ◽  
Miguel A. Molina-Cabello ◽  
Rafaela Benitez-Rochel ◽  
Ezequiel Lopez-Rubio

Life ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 8 ◽  
Author(s):  
Michael S. Wang ◽  
Kenric J. Hoegler ◽  
Michael H. Hecht

Life as we know it would not exist without the ability of protein sequences to bind metal ions. Transition metals, in particular, play essential roles in a wide range of structural and catalytic functions. The ubiquitous occurrence of metalloproteins in all organisms leads one to ask whether metal binding is an evolved trait that occurred only rarely in ancestral sequences, or alternatively, whether it is an innate property of amino acid sequences, occurring frequently in unevolved sequence space. To address this question, we studied 52 proteins from a combinatorial library of novel sequences designed to fold into 4-helix bundles. Although these sequences were neither designed nor evolved to bind metals, the majority of them have innate tendencies to bind the transition metals copper, cobalt, and zinc with high nanomolar to low-micromolar affinity.


2020 ◽  
Vol 117 (48) ◽  
pp. 30362-30369
Author(s):  
Shane J. Caldwell ◽  
Ian C. Haydon ◽  
Nikoletta Piperidou ◽  
Po-Ssu Huang ◽  
Matthew J. Bick ◽  
...  

De novo protein design has succeeded in generating a large variety of globular proteins, but the construction of protein scaffolds with cavities that could accommodate large signaling molecules, cofactors, and substrates remains an outstanding challenge. The long, often flexible loops that form such cavities in many natural proteins are difficult to precisely program and thus challenging for computational protein design. Here we describe an alternative approach to this problem. We fused two stable proteins with C2 symmetry—a de novo designed dimeric ferredoxin fold and a de novo designed TIM barrel—such that their symmetry axes are aligned to create scaffolds with large cavities that can serve as binding pockets or enzymatic reaction chambers. The crystal structures of two such designs confirm the presence of a 420 cubic Ångström chamber defined by the top of the designed TIM barrel and the bottom of the ferredoxin dimer. We functionalized the scaffold by installing a metal-binding site consisting of four glutamate residues close to the symmetry axis. The protein binds lanthanide ions with very high affinity as demonstrated by tryptophan-enhanced terbium luminescence. This approach can be extended to other metals and cofactors, making this scaffold a modular platform for the design of binding proteins and biocatalysts.


Biochemistry ◽  
2006 ◽  
Vol 45 (18) ◽  
pp. 5848-5856 ◽  
Author(s):  
Anna Wilkins Maniccia ◽  
Wei Yang ◽  
Shun-yi Li ◽  
Julian A. Johnson ◽  
Jenny J. Yang

2022 ◽  
Vol 2022 (1) ◽  
Author(s):  
Jing Lin ◽  
Laurent L. Njilla ◽  
Kaiqi Xiong

AbstractDeep neural networks (DNNs) are widely used to handle many difficult tasks, such as image classification and malware detection, and achieve outstanding performance. However, recent studies on adversarial examples, which have maliciously undetectable perturbations added to their original samples that are indistinguishable by human eyes but mislead the machine learning approaches, show that machine learning models are vulnerable to security attacks. Though various adversarial retraining techniques have been developed in the past few years, none of them is scalable. In this paper, we propose a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models. The proposed method retrains the model with both Gaussian noise augmentation and adversarial generation techniques for better generalization. Furthermore, the ensemble model is utilized during the testing phase in order to increase the robust test accuracy. The results from our extensive experiments demonstrate that the proposed approach increases the robustness of the DNN model against various adversarial attacks, specifically, fast gradient sign attack, Carlini and Wagner (C&W) attack, Projected Gradient Descent (PGD) attack, and DeepFool attack. To be precise, the robust classifier obtained by our proposed approach can maintain a performance accuracy of 99% on average on the standard test set. Moreover, we empirically evaluate the runtime of two of the most effective adversarial attacks, i.e., C&W attack and BIM attack, to find that the C&W attack can utilize GPU for faster adversarial example generation than the BIM attack can. For this reason, we further develop a parallel implementation of the proposed approach. This parallel implementation makes the proposed approach scalable for large datasets and complex models.


2013 ◽  
Author(s):  
Eleisha L. Jackson ◽  
Noah Ollikainen ◽  
Arthur W. Covert III ◽  
Tanja Kortemme ◽  
Claus O. Wilke

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.


Author(s):  
Lewis Moffat ◽  
Joe G. Greener ◽  
David T. Jones

AbstractThe prediction of protein structure and the design of novel protein sequences and structures have long been intertwined. The recently released AlphaFold has heralded a new generation of accurate protein structure prediction, but the extent to which this affects protein design stands yet unexplored. Here we develop a rapid and effective approach for fixed backbone computational protein design, leveraging the predictive power of AlphaFold. For several designs we demonstrate that not only are the AlphaFold predicted structures in agreement with the desired backbones, but they are also supported by the structure predictions of other supervised methods as well as ab initio folding. These results suggest that AlphaFold, and methods like it, are able to facilitate the development of a new range of novel and accurate protein design methodologies.


Sign in / Sign up

Export Citation Format

Share Document