scholarly journals Deep generative modeling for protein design

2022 ◽  
Vol 72 ◽  
pp. 226-236
Author(s):  
Alexey Strokach ◽  
Philip M. Kim
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jeanne Trinquier ◽  
Guido Uguzzoni ◽  
Andrea Pagnani ◽  
Francesco Zamponi ◽  
Martin Weigt

AbstractGenerative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 102 and 103). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model’s entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 1068 possible sequences, which nevertheless constitute only the astronomically small fraction 10−80 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.


Author(s):  
Raphael R. Eguchi ◽  
Namrata Anand ◽  
Christian A. Choe ◽  
Po-Ssu Huang

ABSTRACTWhile deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model.


2021 ◽  
Author(s):  
Jeanne Trinquier ◽  
Guido Uguzzoni ◽  
Andrea Pagnani ◽  
Francesco Zamponi ◽  
Martin Weigt

Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally extremely efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost. Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Using these models, we can easily estimate both the model probability of a given sequence, and the size of the functional sequence space related to a specific protein family. In the case of response regulators, we find a huge number of ca. 1068 sequences, which nevertheless constitute only the astronomically small fraction 10-80 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.


2021 ◽  
Author(s):  
Tim Kucera ◽  
Matteo Togninalli ◽  
Laetitia Meng-Papaxanthos

Motivation: Protein Design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, Machine Learning has enabled to solve complex problems by leveraging the large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. Results: Here we approach the problem of general purpose Protein Design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep learning baselines for protein sequence generation. We further give insights into the model by analysing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could create proteins with novel functions by combining labels and provide first steps into this direction of research.


Nature ◽  
2009 ◽  
Author(s):  
Erika Check Hayden
Keyword(s):  

2001 ◽  
Vol 4 (8) ◽  
pp. 643-659 ◽  
Author(s):  
Alfonso Jaramillo ◽  
Lorenz Wernisch ◽  
Stephanie Hery ◽  
Shosana Wodak
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document