Protein Sequence Design by Energy Landscaping

AbstractThe primary challenge of fixed-backbone protein design is to find a distribution of sequences that fold to the backbone of interest. This task is central to nearly all protein engineering problems, as achieving a particular backbone conformation is often a prerequisite for hosting specific functions. In this study, we investigate the capability of a deep neural network to learn the requisite patterns needed to design sequences. The trained model serves as a potential function defined over the space of amino acid identities and rotamer states, conditioned on the local chemical environment at each residue. While most deep learning based methods for sequence design only produce amino acid sequences, our method generates full-atom structural models, which can be evaluated using established sequence quality metrics. Under these metrics we are able to produce realistic and variable designs with quality comparable to the state-of-the-art. Additionally, we experimentally test designs for a de novo TIM-barrel structure and find designs that fold, demonstrating the algorithm’s generalizability to novel structures. Overall, our results demonstrate that a deep learning model can match state-of-the-art energy functions for guiding protein design.SignificanceProtein design tasks typically depend on carefully modeled and parameterized heuristic energy functions. In this study, we propose a novel machine learning method for fixed-backbone protein sequence design, using a learned neural network potential to not only design the sequence of amino acids but also select their side-chain configurations, or rotamers. Factoring through a structural representation of the protein, the network generates designs on par with the state-of-the-art, despite having been entirely learned from data. These results indicate an exciting future for protein design driven by machine learning.

Download Full-text

Effective scoring function for protein sequence design

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.10560 ◽

2003 ◽

Vol 54 (2) ◽

pp. 271-281 ◽

Cited By ~ 34

Author(s):

Shide Liang ◽

Nick V. Grishin

Keyword(s):

Protein Sequence ◽

Scoring Function ◽

Sequence Design

Download Full-text

The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/978-3-540-27801-6_18 ◽

2004 ◽

pp. 244-253 ◽

Cited By ~ 3

Author(s):

Piotr Berman ◽

Bhaskar DasGupta ◽

Dhruv Mubayi ◽

Robert Sloan ◽

György Turán ◽

...

Keyword(s):

Protein Sequence ◽

Design Problem ◽

Canonical Model ◽

Sequence Design ◽

2D And 3D

Download Full-text

Increasing the efficiency and accuracy of the ABACUS protein sequence design method

Bioinformatics ◽

10.1093/bioinformatics/btz515 ◽

2019 ◽

Vol 36 (1) ◽

pp. 136-144 ◽

Cited By ~ 3

Author(s):

Peng Xiong ◽

Xiuhong Hu ◽

Bin Huang ◽

Jiahai Zhang ◽

Quan Chen ◽

...

Keyword(s):

Protein Sequence ◽

Design Method ◽

Solvent Accessibility ◽

Supplementary Information ◽

Survey Method ◽

Energy Functions ◽

Sequence Design ◽

Feature Spaces ◽

Empirical Function ◽

Representative Points

Abstract Motivation The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures. Results We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments. Availability and implementation The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text