scholarly journals A general-purpose protein design framework based on mining sequence–structure relationships in known protein structures

2019 ◽  
Vol 117 (2) ◽  
pp. 1059-1068 ◽  
Author(s):  
Jianfu Zhou ◽  
Alexandra E. Panaitiu ◽  
Gevorg Grigoryan

Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework—one based on identifying and applying patterns of sequence–structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.

2018 ◽  
Author(s):  
Jianfu Zhou ◽  
Alexandra E. Panaitiu ◽  
Gevorg Grigoryan

AbstractThe ability to routinely design functional proteins, in a targeted manner, would have enormous implications for biomedical research and therapeutic development. Computational protein design (CPD) offers the potential to fulfill this need, and though recent years have brought considerable progress in the field, major limitations remain. Current state-of-the-art approaches to CPD aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a robust general solution to CPD has yet to be found. Here we propose a fundamentally novel design framework—one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of inter-atomic interactions. Specifically, we systematically decompose the target structure to be designed into structural building blocks we call TERMs (tertiary motifs) and use rapid structure search against the Protein Data Bank (PDB) to identify sequence patterns associated with each TERM from known protein structures that contain it. These results are then combined to produce a sequence-level pseudo-energy model that can score any sequence for compatibility with the target structure. This model can then be used to extract the optimal-scoring sequence via combinatorial optimization or otherwise sample the sequence space predicted to be well compatible with folding to the target. Here we carry out extensive computational analyses, showing that our method, which we dub dTERMen (design with TERM energies): 1) produces native-like sequences given native crystallographic or NMR backbones, 2) produces sequence-structure compatibility scores that correlate with thermodynamic stability, and 3) is able to predict experimental success of designed sequences generated with other methods, and 4) designs sequences that are found to fold to the desired target by structure prediction more frequently than sequences designed with an atomistic method. As an experimental validation of dTERMen, we perform a total surface redesign of Red Fluorescent Protein mCherry, marking a total of 64 residues as variable. The single sequence identified as optimal by dTERMen harbors 48 mutations relative to mCherry, but nevertheless folds, is monomeric in solution, exhibits similar stability to chemical denaturation as mCherry, and even preserves the fluorescence property. Our results strongly argue that the PDB is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. This is highly significant, given that the structural database will only continue to grow, and signals the possibility of a whole host of novel data-driven CPD methods. Because such methods are likely to have orthogonal strengths relative to existing techniques, they could represent an important step towards removing remaining barriers to robust CPD.


2021 ◽  
Vol 7 ◽  
Author(s):  
Arun S. Konagurthu ◽  
Ramanan Subramanian ◽  
Lloyd Allison ◽  
David Abramson ◽  
Peter J. Stuckey ◽  
...  

What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.


2001 ◽  
Vol 68 ◽  
pp. 111-123 ◽  
Author(s):  
John Walshaw ◽  
Jennifer M. Shipway ◽  
Derek N. Woolfson

The coiled coil is a ubiquitous motif that guides many different protein-protein interactions. The accepted hallmark of coiled coils is a seven-residue (heptad) sequence repeat. The positions of this repeat are labelled a-b-c-d-e-f-g, with residues at a and d tending to be hydrophobic. Such sequences form amphipathic α-helices, which assemble into helical bundles via knobs-into-holes interdigitation of residues from neighbouring helices. We wrote an algorithm, SOCKET, to identify this packing in protein structures, and used this to gather a database of coiled-coil structures from the Protein Data Bank. Surprisingly, in addition to commonly accepted structures with a single, contiguous heptad repeat, we identified sequences with multiple, offset heptad repeats. These 'new' sequence patterns help to explain oligomer-state specification in coiled coils. Here we focus on the structural consequences for sequences with two heptad repeats offset by two residues, i.e. a/f′-b/g′-c/a′-d/b′-e/c′-f/d′-g/e′. This sets up two hydrophobic seams on opposite sides of the helix formed. We describe how such helices may combine to bury these hydrophobic surfaces in two different ways and form two distinct structures: open 'α-sheets' and closed 'α-cylinders'. We highlight these with descriptions of natural structures and outline possibilities for protein design.


2019 ◽  
Author(s):  
Sari Sabban ◽  
Mikhail Markovsky

AbstractThe ability to perform de novo protein design will allow researchers to expand the variety of available proteins. By designing synthetic structures computationally, they can utilise more structures than those available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that uses a long short-term memory (LSTM) architecture. The LSTM model used only the ϕ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM neural network architecture using only the ϕ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures.Author summaryThis research project stemmed from the desire to expand the pool of protein structures that can be used as scaffolds in computational vaccine development, since the number of structures available from the Protein Data Bank was not sufficient to allow for great diversity and increase the probability of grafting a target motif onto a protein scaffold. Since a protein structure’s backbone can be defined by the ϕ and ψ angles of each amino acid in the polypeptide and can effectively translate a protein’s 3D structure into a table of numbers, and since protein structures are not random, this numerical representation of protein structures can be used to train a neural network to mathematically generalise what a protein structure is, and therefore generate new a protein backbone. Instead of using all proteins in the Protein Data Bank a curated dataset was used encompassing protein structures with specific characteristics that will, theoretically, allow them to be evaluated computationally. This paper details how a trained neural network was able to successfully generate helical protein backbones.


2021 ◽  
Author(s):  
Michael Jendrusch ◽  
Jan O. Korbel ◽  
S. Kashif Sadiq

De novo protein design is a longstanding fundamental goal of synthetic biology, but has been hindered by the difficulty in reliable prediction of accurate high-resolution protein structures from sequence. Recent advances in the accuracy of protein structure prediction methods, such as AlphaFold (AF), have facilitated proteome scale structural predictions of monomeric proteins. Here we develop AlphaDesign, a computational framework for de novo protein design that embeds AF as an oracle within an optimisable design process. Our framework enables rapid prediction of completely novel protein monomers starting from random sequences. These are shown to adopt a diverse array of folds within the known protein space. A recent and unexpected utility of AF to predict the structure of protein complexes, further allows our framework to design higher-order complexes. Subsequently a range of predictions are made for monomers, homodimers, heterodimers as well as higher-order homo-oligomers -trimers to hexamers. Our analyses also show potential for designing proteins that bind to a pre-specified target protein. Structural integrity of predicted structures is validated and confirmed by standard ab initio folding and structural analysis methods as well as more extensively by performing rigorous all-atom molecular dynamics simulations and analysing the corresponding structural flexibility, intramonomer and interfacial amino-acid contacts. These analyses demonstrate widespread maintenance of structural integrity and suggests that our framework allows for fairly accurate protein design. Strikingly, our approach also reveals the capacity of AF to predict proteins that switch conformation upon complex formation, such as involving switches from α-helices to β-sheets during amyloid filament formation. Correspondingly, when integrated into our design framework, our approach reveals de novo design of a subset of proteins that switch conformation between monomeric and oligomeric state.


2018 ◽  
Author(s):  
Arthur M. Lesk ◽  
Ramanan Subramanian ◽  
Lloyd Allison ◽  
David Abramson ◽  
Peter J. Stuckey ◽  
...  

ABSTRACTWhat is the architectural ‘basis set’ of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a comprehensive dictionary of 1,493 substructural concepts. Each concept represents a topologically-conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the world-wide protein data bank and completely inventoried all concept instances. This yields an unprecedented source of biological insights. These include: correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click) provides access to and navigation of the entire dictionary of concepts, and all associated information.


2017 ◽  
Vol 372 (1726) ◽  
pp. 20160213 ◽  
Author(s):  
Ai Niitsu ◽  
Jack W. Heal ◽  
Kerstin Fauland ◽  
Andrew R. Thomson ◽  
Derek N. Woolfson

The rational ( de novo ) design of membrane-spanning proteins lags behind that for water-soluble globular proteins. This is due to gaps in our knowledge of membrane-protein structure, and experimental difficulties in studying such proteins compared to water-soluble counterparts. One limiting factor is the small number of experimentally determined three-dimensional structures for transmembrane proteins. By contrast, many tens of thousands of globular protein structures provide a rich source of ‘scaffolds’ for protein design, and the means to garner sequence-to-structure relationships to guide the design process. The α-helical coiled coil is a protein-structure element found in both globular and membrane proteins, where it cements a variety of helix–helix interactions and helical bundles. Our deep understanding of coiled coils has enabled a large number of successful de novo designs. For one class, the α-helical barrels—that is, symmetric bundles of five or more helices with central accessible channels—there are both water-soluble and membrane-spanning examples. Recent computational designs of water-soluble α-helical barrels with five to seven helices have advanced the design field considerably. Here we identify and classify analogous and more complicated membrane-spanning α-helical barrels from the Protein Data Bank. These provide tantalizing but tractable targets for protein engineering and de novo protein design. This article is part of the themed issue ‘Membrane pores: from structure and assembly, to medicine and technology’.


1998 ◽  
Vol 54 (6) ◽  
pp. 1085-1094 ◽  
Author(s):  
Helge Weissig ◽  
Ilya N. Shindyalov ◽  
Philip E. Bourne

Databases containing macromolecular structure data provide a crystallographer with important tools for use in solving, refining and understanding the functional significance of their protein structures. Given this importance, this paper briefly summarizes past progress by outlining the features of the significant number of relevant databases developed to date. One recent database, PDB+, containing all current and obsolete structures deposited with the Protein Data Bank (PDB) is discussed in more detail. PDB+ has been used to analyze the self-consistency of the current (1 January 1998) corpus of over 7000 structures. A summary of those findings is presented (a full discussion will appear elsewhere) in the form of global and temporal trends within the data. These trends indicate that challenges exist if crystallographers are to provide the community with complete and consistent structural results in the future. It is argued that better information management practices are required to meet these challenges.


2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-24
Author(s):  
Andrey Krekhov ◽  
Katharina Emmerich ◽  
Ronja Rotthaler ◽  
Jens Krueger

Escape rooms exist in various forms, including real-life facilities, board games, and digital implementations. The underlying idea is always the same: players have to solve many diverse puzzles to (virtually) escape from a locked room. Within the last decade, we witnessed a rapidly increasing popularity of such games, which also amplified the amount of related research. However, the respective academic landscape is mostly fragmented in its current state, lacking a common model and vocabulary that would withstand these games' variety. This manuscript aims to establish such a foundation for the analysis and construction of escape rooms. In a first step, we derive a high-level design framework from prior literature. Then, as our main contribution, we establish an atomic puzzle taxonomy that closes the gap between the analog and digital domains. The taxonomy is developed in multiple steps: we compose a basic structure based on previous literature and systematically refine it by analyzing 39 analog and digital escape room games, including recent virtual reality representatives. The final taxonomy consists of mental, physical, and emotional challenges, thereby providing a robust and approachable basis for future works across all application domains that deal with escape rooms or puzzles in general.


2019 ◽  
Vol 400 (3) ◽  
pp. 351-366 ◽  
Author(s):  
Mikhail Barkovskiy ◽  
Elena Ilyukhina ◽  
Martin Dauner ◽  
Andreas Eichinger ◽  
Arne Skerra

Abstract Colchicine is a toxic alkaloid prevalent in autumn crocus (Colchicum autumnale) that binds to tubulin and inhibits polymerization of microtubules. Using combinatorial and rational protein design, we have developed an artificial binding protein based on the human lipocalin 2 that binds colchicine with a dissociation constant of 120 pm, i.e. 10000-fold stronger than tubulin. Crystallographic analysis of the engineered lipocalin, dubbed Colchicalin, revealed major structural changes in the flexible loop region that forms the ligand pocket at the open end of the eight-stranded β-barrel, resulting in a lid-like structure over the deeply buried colchicine. A cis-peptide bond between residues Phe71 and Pro72 in loop #2 constitutes a peculiar feature and allows intimate contact with the tricyclic ligand. Using directed evolution, we achieved an extraordinary dissociation half-life of more than 9 h for the Colchicalin-colchicine complex. Together with the chemical robustness of colchicine and availability of activated derivatives, this also opens applications as a general-purpose affinity reagent, including facile quantification of colchicine in biological samples. Given that engineered lipocalins, also known as Anticalin® proteins, represent a class of clinically validated biopharmaceuticals, Colchicalin may offer a therapeutic antidote to scavenge colchicine and reverse its poisoning effect in situations of acute intoxication.


Sign in / Sign up

Export Citation Format

Share Document