Computational design of structured loops for new protein functions

Kale Kundert; Tanja Kortemme

doi:10.1515/hsz-2018-0348

Computational design of structured loops for new protein functions

Biological Chemistry ◽

10.1515/hsz-2018-0348 ◽

2019 ◽

Vol 400 (3) ◽

pp. 275-288 ◽

Cited By ~ 10

Author(s):

Kale Kundert ◽

Tanja Kortemme

Keyword(s):

Protein Design ◽

Protein Function ◽

Structure Prediction ◽

Computational Design ◽

Loop Structure ◽

Functional Sites ◽

Loop Design ◽

Routine Design ◽

Protein Functions ◽

New Protein

Abstract The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.

Download Full-text

RosettaSurf - a surface-centric computational design approach

10.1101/2021.06.16.448645 ◽

2021 ◽

Author(s):

Andreas Scheck ◽

Stephane Rosset ◽

Michael Defferrard ◽

Andreas Loukas ◽

Jaume Bonet ◽

...

Keyword(s):

Protein Design ◽

Protein Function ◽

Fundamental Problem ◽

Protein Structures ◽

Scoring Function ◽

Computational Design ◽

Surface Shape ◽

Molecular Surface ◽

Surface Features ◽

Functional Sites

Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.

Download Full-text

Protein structure prediction and design in a biologically-realistic implicit membrane

10.1101/630715 ◽

2019 ◽

Author(s):

Rebecca F. Alford ◽

Patrick J. Fleming ◽

Karen G. Fleming ◽

Jeffrey J. Gray

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Membrane Proteins ◽

Membrane Protein ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Computational Design ◽

Amino Acid Distribution

ABSTRACTProtein design is a powerful tool for elucidating mechanisms of function and engineering new therapeutics and nanotechnologies. While soluble protein design has advanced, membrane protein design remains challenging due to difficulties in modeling the lipid bilayer. In this work, we developed an implicit approach that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. The model improves performance in computational bench-marks against experimental targets including prediction of protein orientations in the bilayer, ΔΔG calculations, native structure dis-crimination, and native sequence recovery. When applied to de novo protein design, this approach designs sequences with an amino acid distribution near the native amino acid distribution in membrane proteins, overcoming a critical flaw in previous membrane models that were prone to generating leucine-rich designs. Further, the proteins designed in the new membrane model exhibit native-like features including interfacial aromatic side chains, hydrophobic lengths compatible with bilayer thickness, and polar pores. Our method advances high-resolution membrane protein structure prediction and design toward tackling key biological questions and engineering challenges.Significance StatementMembrane proteins participate in many life processes including transport, signaling, and catalysis. They constitute over 30% of all proteins and are targets for over 60% of pharmaceuticals. Computational design tools for membrane proteins will transform the interrogation of basic science questions such as membrane protein thermodynamics and the pipeline for engineering new therapeutics and nanotechnologies. Existing tools are either too expensive to compute or rely on manual design strategies. In this work, we developed a fast and accurate method for membrane protein design. The tool is available to the public and will accelerate the experimental design pipeline for membrane proteins.

Download Full-text

Smotifs as structural local descriptors of supersecondary elements: classification, completeness and applications

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0016 ◽

2014 ◽

Vol 10 (4) ◽

Author(s):

Jaume Bonet ◽

Andras Fiser ◽

Baldo Oliva ◽

Narcis Fernandez-Fuentes

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Regular Structure ◽

Loop Structure ◽

Apparent Lack ◽

Knowledge Based ◽

Limits Of Knowledge ◽

Folding Dynamics ◽

And Function

AbstractProtein structures are made up of periodic and aperiodic structural elements (i.e., α-helices, β-strands and loops). Despite the apparent lack of regular structure, loops have specific conformations and play a central role in the folding, dynamics, and function of proteins. In this article, we reviewed our previous works in the study of protein loops as local supersecondary structural motifs or Smotifs. We reexamined our works about the structural classification of loops (ArchDB) and its application to loop structure prediction (ArchPRED), including the assessment of the limits of knowledge-based loop structure prediction methods. We finalized this article by focusing on the modular nature of proteins and how the concept of Smotifs provides a convenient and practical approach to decompose proteins into strings of concatenated Smotifs and how can this be used in computational protein design and protein structure prediction.

Download Full-text

Accurately positioning functional residues with robotics-inspired computational protein design

10.1101/2021.07.02.450934 ◽

2021 ◽

Author(s):

Cody Krivacic ◽

Kale Kundert ◽

Xingjie Pan ◽

Roland A Pache ◽

Lin Liu ◽

...

Keyword(s):

Protein Design ◽

Active Sites ◽

Peptide Fragments ◽

Loop Conformations ◽

Protein Functions ◽

Protein Backbones ◽

Design Protocol ◽

New Protein ◽

Active Site Region ◽

Local Protein

Accurate positioning of functional residues is critical for the design of new protein functions, but has remained difficult because of the prevalence of irregular local geometries in active sites. Here we introduce two computational methods that build local protein geometries from sequence with atomic accuracy: fragment kinematic closure (FKIC) and loophash kinematic closure (LHKIC). FKIC and LHKIC integrate two approaches: robotics-inspired kinematics of protein backbones and insertion of peptide fragments, and show up to 140-fold improvements in native-like predictions over either approach alone. We then integrate these methods into a new design protocol, pull-into-place (PIP), to position functionally important sidechains via design of new structured loop conformations. We validate PIP by remodeling a sizeable active site region in an enzyme and confirming the engineered new conformations of two designs with crystal structures. The described methods can be applied broadly to the design of many new protein geometries and functions.

Download Full-text

Post-Translation Regulation of Influenza Virus Replication

Annual Review of Virology ◽

10.1146/annurev-virology-010320-070410 ◽

2020 ◽

Vol 7 (1) ◽

pp. 167-187

Author(s):

Anthony R. Dawson ◽

Gary M. Wilson ◽

Joshua J. Coon ◽

Andrew Mehle

Keyword(s):

Influenza Virus ◽

Protein Function ◽

Translation Regulation ◽

Host Use ◽

Host Proteins ◽

Post Translational Modifications ◽

Cellular Factors ◽

Protein Functions ◽

Influenza Virus Replication ◽

New Protein

Influenza virus exploits cellular factors to complete each step of viral replication. Yet, multiple host proteins actively block replication. Consequently, infection success depends on the relative speed and efficacy at which both the virus and host use their respective effectors. Post-translational modifications (PTMs) afford both the virus and the host means to readily adapt protein function without the need for new protein production. Here we use influenza virus to address concepts common to all viruses, reviewing how PTMs facilitate and thwart each step of the replication cycle. We also discuss advancements in proteomic methods that better characterize PTMs. Although some effectors and PTMs have clear pro- or antiviral functions, PTMs generally play regulatory roles to tune protein functions, levels, and localization. Synthesis of our current understanding reveals complex regulatory schemes where the effects of PTMs are time and context dependent as the virus and host battle to control infection.

Download Full-text

Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB)

Bioinformatics and Biology Insights ◽

10.1177/117793220700100004 ◽

2007 ◽

Vol 1 ◽

pp. 117793220700100 ◽

Cited By ~ 2

Author(s):

Antoni Hermoso ◽

Jordi Espadaler ◽

E Enrique Querol ◽

Francesc X. Aviles ◽

Michael J.E. Sternberg ◽

...

Keyword(s):

Protein Function ◽

Structure Prediction ◽

Sequence Similarity ◽

Protein Structures ◽

Fold Increase ◽

Loop Structure ◽

Biological Databases ◽

Loop Modeling ◽

And Function

Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB ( http://sbi.imim.es/archdb ) is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ~40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the comparative study of loops, the analysis of loops involved in protein function and to obtain templates for loop modeling.

Download Full-text

Machine learning for discovering missing or wrong protein function annotations

BMC Bioinformatics ◽

10.1186/s12859-019-3060-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 5

Author(s):

Felipe Kenji Nakano ◽

Mathias Lietaert ◽

Celine Vens

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Daily Basis ◽

Proteomic Data ◽

Evaluation Task ◽

Protein Functions ◽

Benchmark Datasets ◽

Or Gene ◽

New Protein

Abstract Background A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical multi-label classification (HMC) methods to predict annotations, using the Functional Catalogue (FunCat) or Gene Ontology (GO) label hierarchies. Most of these studies employed benchmark datasets created more than a decade ago, and thus train their models on outdated information. In this work, we provide an updated version of these datasets. By querying recent versions of FunCat and GO yeast annotations, we provide 24 new datasets in total. We compare four HMC methods, providing baseline results for the new datasets. Furthermore, we also evaluate whether the predictive models are able to discover new or wrong annotations, by training them on the old data and evaluating their results against the most recent information. Results The results demonstrated that the method based on predictive clustering trees, Clus-Ensemble, proposed in 2008, achieved superior results compared to more recent methods on the standard evaluation task. For the discovery of new knowledge, Clus-Ensemble performed better when discovering new annotations in the FunCat taxonomy, whereas hierarchical multi-label classification with genetic algorithm (HMC-GA), a method based on genetic algorithms, was overall superior when detecting annotations that were removed. In the GO datasets, Clus-Ensemble once again had the upper hand when discovering new annotations, HMC-GA performed better for detecting removed annotations. However, in this evaluation, there were less significant differences among the methods. Conclusions The experiments have showed that protein function prediction is a very challenging task which should be further investigated. We believe that the baseline results associated with the updated datasets provided in this work should be considered as guidelines for future studies, nonetheless the old versions of the datasets should not be disregarded since other tasks in machine learning could benefit from them.

Download Full-text

Deep learning techniques have significantly impacted protein structure prediction and protein design

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2021.01.007 ◽

2021 ◽

Vol 68 ◽

pp. 194-207

Author(s):

Robin Pearce ◽

Yang Zhang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Learning Techniques

Download Full-text

A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods

Current Medicinal Chemistry ◽

10.2174/0929867328666210910125802 ◽

2021 ◽

Vol 28 ◽

Author(s):

Yu-He Yang ◽

Jia-Shu Wang ◽

Shi-Shi Yuan ◽

Meng-Lu Liu ◽

Wei Su ◽

...

Keyword(s):

Machine Learning ◽

Protein Function ◽

Vital Role ◽

Atp Binding ◽

Learning Methods ◽

Machine Learning Methods ◽

Protein Ligand Interactions ◽

Protein Functions ◽

Ligand Interactions ◽

Binding Residues

: Protein-ligand interactions are necessary for majority protein functions. Adenosine-5’-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.

Download Full-text

Optobiochemistry: Genetically Encoded Control of Protein Activity by Light

Annual Review of Biochemistry ◽

10.1146/annurev-biochem-072420-112431 ◽

2021 ◽

Vol 90 (1) ◽

Author(s):

Jihye Seong ◽

Michael Z. Lin

Keyword(s):

Protein Function ◽

Living Cells ◽

Optical Methods ◽

Annual Review ◽

Publication Date ◽

Protein Activity ◽

Spatiotemporal Resolution ◽

Protein Functions ◽

Protein Classes ◽

Control Protein

Optobiochemical control of protein activities allows the investigation of protein functions in living cells with high spatiotemporal resolution. Over the last two decades, numerous natural photosensory domains have been characterized and synthetic domains engineered and assembled into photoregulatory systems to control protein function with light.Here, we review the field of optobiochemistry, categorizing photosensory domains by chromophore, describing photoregulatory systems by mechanism of action, and discussing protein classes frequently investigated using optical methods. We also present examples of how spatial or temporal control of proteins in living cells has provided new insights not possible with traditional biochemical or cell biological techniques. Expected final online publication date for the Annual Review of Biochemistry, Volume 90 is June 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text