protein sequence space
Recently Published Documents


TOTAL DOCUMENTS

52
(FIVE YEARS 9)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Yashas Samaga B L ◽  
Shampa Raghunathan ◽  
U. Deva Priyakumar

<div>Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency, and a new machine learning based method, first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that estimates a residue's contributions towards protein stability dG in its local structural environment. The difference between independently predicted contributions of the reference and mutant residues in a missense mutation is reported as dG. We show that this self-consistent machine learning architecture is immune to many common biases in datasets, relies less on data than existing methods, and is robust to overfitting.</div><div><br></div>


2021 ◽  
Author(s):  
Yashas Samaga B L ◽  
Shampa Raghunathan ◽  
U. Deva Priyakumar

<div>Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency, and a new machine learning based method, first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that estimates a residue's contributions towards protein stability dG in its local structural environment. The difference between independently predicted contributions of the reference and mutant residues in a missense mutation is reported as dG. We show that this self-consistent machine learning architecture is immune to many common biases in datasets, relies less on data than existing methods, and is robust to overfitting.</div><div><br></div>


2021 ◽  
Author(s):  
Jimin Yoon ◽  
Emmanuel E. Nekongo ◽  
Jessica E. Patrick ◽  
Angela M. Phillips ◽  
Anna I. Ponomarenko ◽  
...  

AbstractThe sequence space accessible to evolving proteins can be enhanced by cellular chaperones that assist biophysically defective clients in navigating complex folding landscapes. It is also possible, however, for proteostasis mechanisms that promote strict quality control to greatly constrain accessible protein sequence space. Unfortunately, most efforts to understand how proteostasis mechanisms influence evolution rely on artificial inhibition or genetic knockdown of specific chaperones. The few experiments that perturb quality control pathways also generally modulate the levels of only individual quality control factors. Here, we use chemical genetic strategies to tune proteostasis networks via natural stress response pathways that regulate levels of entire suites of chaperones and quality control mechanisms. Specifically, we upregulate the unfolded protein response (UPR) to test the hypothesis that the host endoplasmic reticulum (ER) proteostasis network shapes the sequence space accessible to human immunodeficiency virus-1 (HIV) envelope (Env) protein. Elucidating factors that enhance or constrain Env sequence space is critical because Env evolves extremely rapidly, yielding HIV strains with antibody and drug escape mutations. We find that UPR-mediated upregulation of ER proteostasis factors, particularly those controlled by the IRE1-XBP1s UPR arm, globally reduces Env mutational tolerance. Conserved, functionally important Env regions exhibit the largest decreases in mutational tolerance upon XBP1s activation. This phenomenon likely reflects strict quality control endowed by XBP1s-mediated remodeling of the ER proteostasis environment. Intriguingly and in contrast, specific regions of Env, including regions targeted by broadly neutralizing antibodies, display enhanced mutational tolerance when XBP1s is activated, hinting at a role for host proteostasis network hijacking in potentiating antibody escape. These observations reveal a key function for proteostasis networks in decreasing instead of expanding the sequence space accessible to client proteins, while also demonstrating that the host ER proteostasis network profoundly shapes the mutational tolerance of Env in ways that could have important consequences for HIV adaptation.


Life ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 9 ◽  
Author(s):  
Christina Karas ◽  
Michael Hecht

Protein sequence space is vast; nature uses only an infinitesimal fraction of possible sequences to sustain life. Are there solutions to biological problems other than those provided by nature? Can we create artificial proteins that sustain life? To investigate these questions, we have created combinatorial collections, or libraries, of novel sequences with no homology to those found in living organisms. Previously designed libraries contained numerous functional proteins. However, they often formed dynamic, rather than well-ordered structures, which complicated structural and mechanistic characterization. To address this challenge, we describe the development of new libraries based on the de novo protein S-824, a 4-helix bundle with a very stable 3-dimensional structure. Distinct from previous libraries, we targeted variability to a specific region of the protein, seeking to create potential functional sites. By characterizing variant proteins from this library, we demonstrate that the S-824 scaffold tolerates diverse amino acid substitutions in a putative cavity, including buried polar residues suitable for catalysis. We designed and created a DNA library encoding 1.7 × 106 unique protein sequences. This new library of stable de novo α-helical proteins is well suited for screens and selections for a range of functional activities in vitro and in vivo.


2019 ◽  
Vol 295 (13) ◽  
pp. 4316-4326 ◽  
Author(s):  
Zachary Armstrong ◽  
Gideon J. Davies

Recent work exploring protein sequence space has revealed a new glycoside hydrolase (GH) family (GH164) of putative mannosidases. GH164 genes are present in several commensal bacteria, implicating these genes in the degradation of dietary glycans. However, little is known about the structure, mechanism of action, and substrate specificity of these enzymes. Herein we report the biochemical characterization and crystal structures of the founding member of this family (Bs164) from the human gut symbiont Bacteroides salyersiae. Previous reports of this enzyme indicated that it has α-mannosidase activity, however, we conclusively show that it cleaves only β-mannose linkages. Using NMR spectroscopy, detailed enzyme kinetics of WT and mutant Bs164, and multiangle light scattering we found that it is a trimeric retaining β-mannosidase, that is susceptible to several known mannosidase inhibitors. X-ray crystallography revealed the structure of Bs164, the first known structure of a GH164, at 1.91 Å resolution. Bs164 is composed of three domains: a (β/α)8 barrel, a trimerization domain, and a β-sandwich domain, representing a previously unobserved structural-fold for β-mannosidases. Structures of Bs164 at 1.80–2.55 Å resolution in complex with the inhibitors noeuromycin, mannoimidazole, or 2,4-dinitrophenol 2-deoxy-2-fluoro-mannoside reveal the residues essential for specificity and catalysis including the catalytic nucleophile (Glu-297) and acid/base residue (Glu-160). These findings further our knowledge of the mechanisms commensal microbes use for nutrient acquisition.


2019 ◽  
Vol 8 (6) ◽  
pp. 1371-1378 ◽  
Author(s):  
Andrew Currin ◽  
Jane Kwok ◽  
Joanna C. Sadler ◽  
Elizabeth L. Bell ◽  
Neil Swainston ◽  
...  

2019 ◽  
Vol 20 (3) ◽  
pp. 236-243
Author(s):  
Yuhua Yao ◽  
Huimin Xu ◽  
Manzhi Li ◽  
Zhaohui Qi ◽  
Bo Liao

Background:Some studies have shown that Human Papillomavirus (HPV) is strongly associated with cervical cancer. As we all know, cervical cancer still remains the fourth most common cancer, affecting women worldwide. Thus, it is both challenging and essential to detect risk types of human papillomaviruses.Methods:In order to discriminate whether HPV type is highly risky or not, many epidemiological and experimental methods have been proposed recently. For HPV risk type prediction, there also have been a few computational studies which are all based on Machine Learning (ML) techniques, but adopt different feature extraction methods. Therefore, we conclude and discuss several classical approaches which have got a better result for the risk type prediction of HPV.Results:This review summarizes the common methods to detect human papillomavirus. The main methods are sequence- derived features, text-based classification, gap-kernel method, ensemble SVM, Word statistical model, position- specific statistical model and mismatch kernel method (SVM). Among these methods, position-specific statistical model get a relatively high accuracy rate (accuracy=97.18%). Word statistical model is also a novel approach, which extracted the information of HPV from the protein “sequence space” with word statistical model to predict high-risk types of HPVs (accuracy=95.59%). These methods could potentially be used to improve prediction of highrisk types of HPVs.Conclusion:From the prediction accuracy, we get that the classification results are more accurate by establishing mathematical models. Thus, adopting mathematical methods to predict risk type of HPV will be the main goal of research in the future.


2019 ◽  
Author(s):  
Derek M Mason ◽  
Simon Friedensohn ◽  
Cédric R Weber ◽  
Christian Jordi ◽  
Bastian Wagner ◽  
...  

ABSTRACTTherapeutic antibody optimization is time and resource intensive, largely because it requires low-throughput screening (103 variants) of full-length IgG in mammalian cells, typically resulting in only a few optimized leads. Here, we use deep learning to interrogate and predict antigen-specificity from a massively diverse sequence space to identify globally optimized antibody variants. Using a mammalian display platform and the therapeutic antibody trastuzumab, rationally designed site-directed mutagenesis libraries are introduced by CRISPR/Cas9-mediated homology-directed repair (HDR). Screening and deep sequencing of relatively small libraries (104) produced high quality data capable of training deep neural networks that accurately predict antigen-binding based on antibody sequence. Deep learning is then used to predict millions of antigen binders from an in silico library of ~108 variants, where experimental testing of 30 randomly selected variants showed all 30 retained antigen specificity. The full set of in silico predicted binders is then subjected to multiple developability filters, resulting in thousands of highly-optimized lead candidates. With its scalability and capacity to interrogate high-dimensional protein sequence space, deep learning offers great potential for antibody engineering and optimization.


Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 423 ◽  
Author(s):  
Anna Posfai ◽  
Juannan Zhou ◽  
Joshua Plotkin ◽  
Justin Kinney ◽  
David McCandlish

A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.


Sign in / Sign up

Export Citation Format

Share Document