scholarly journals Large-scale determination of previously unsolved protein structures using evolutionary information

eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Sergey Ovchinnikov ◽  
Lisa Kinch ◽  
Hahnbeom Park ◽  
Yuxing Liao ◽  
Jimin Pei ◽  
...  

The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue–residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.

2019 ◽  
Vol 20 (10) ◽  
pp. 2442 ◽  
Author(s):  
Teppei Ikeya ◽  
Peter Güntert ◽  
Yutaka Ito

To date, in-cell NMR has elucidated various aspects of protein behaviour by associating structures in physiological conditions. Meanwhile, current studies of this method mostly have deduced protein states in cells exclusively based on ‘indirect’ structural information from peak patterns and chemical shift changes but not ‘direct’ data explicitly including interatomic distances and angles. To fully understand the functions and physical properties of proteins inside cells, it is indispensable to obtain explicit structural data or determine three-dimensional (3D) structures of proteins in cells. Whilst the short lifetime of cells in a sample tube, low sample concentrations, and massive background signals make it difficult to observe NMR signals from proteins inside cells, several methodological advances help to overcome the problems. Paramagnetic effects have an outstanding potential for in-cell structural analysis. The combination of a limited amount of experimental in-cell data with software for ab initio protein structure prediction opens an avenue to visualise 3D protein structures inside cells. Conventional nuclear Overhauser effect spectroscopy (NOESY)-based structure determination is advantageous to elucidate the conformations of side-chain atoms of proteins as well as global structures. In this article, we review current progress for the structure analysis of proteins in living systems and discuss the feasibility of its future works.


Author(s):  
Pouya Tavousi ◽  
Morad Behandish ◽  
Horea T. Ilieş ◽  
Kazem Kazerounian

A reliable prediction of three-dimensional (3D) protein structures from sequence data remains a big challenge due to both theoretical and computational difficulties. We have previously shown that our kinetostatic compliance method (KCM) implemented into the Protofold package can overcome some of the key difficulties faced by other de novo structure prediction methods, such as the very small time steps required by the molecular dynamics (MD) approaches or the very large number of samples needed by the Monte Carlo (MC) sampling techniques. In this paper, we improve the free energy formulation used in Protofold by including the typically underrated entropic effects, imparted due to differences in hydrophobicity of the chemical groups, which dominate the folding of most water-soluble proteins. In addition to the model enhancement, we revisit the numerical implementation by redesigning the algorithms and introducing efficient data structures that reduce the expected complexity from quadratic to linear. Moreover, we develop and optimize parallel implementations of the algorithms on both central and graphics processing units (CPU/GPU) achieving speed-ups up to two orders of magnitude on the GPU. Our simulations are consistent with the general behavior observed in the folding process in aqueous solvent, confirming the effectiveness of model improvements. We report on the folding process at multiple levels, namely, the formation of secondary structural elements and tertiary interactions between secondary elements or across larger domains. We also observe significant enhancements in running times that make the folding simulation tractable for large molecules.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


2020 ◽  
Author(s):  
Jiangyan Feng ◽  
Diwakar Shukla

AbstractProteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e. spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints, and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.


2020 ◽  
Author(s):  
Mingyuan Xu ◽  
Ting Ran ◽  
Hongming Chen

<p><i>De novo</i> molecule design through molecular generative model is gaining increasing attention in recent years. Here a novel generative model was proposed by integrating the 3D structural information of the protein binding pocket into the conditional RNN (cRNN) model to control the generation of drug-like molecules. In this model, the composition of protein binding pocket is effectively characterized through a coarse-grain strategy and the three-dimensional information of the pocket can be represented by the sorted eigenvalues of the coulomb matrix (EGCM) of the coarse-grained atoms composing the binding pocket. In current work, we used our EGCM method and a previously reported binding pocket descriptor DeeplyTough to train cRNN models and compared their performance. It has been shown that the molecules generated with the control of protein environment information have a clear tendency on generating compounds with higher similarity to the original X-ray bound ligand than normal RNN model and also achieving better performance in terms of docking scores. Our results demonstrate the potential application of EGCM controlled generative model for the targeted molecule generation and guided exploration on the drug-like chemical space. </p><p> </p>


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Shambhu Malleshappa Gowder ◽  
Jhinuk Chatterjee ◽  
Tanusree Chaudhuri ◽  
Kusum Paul

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.


Pteridines ◽  
2007 ◽  
Vol 18 (1) ◽  
pp. 79-94
Author(s):  
Marco Wiltgen ◽  
Gernot P. Tilz

Abstract Functional specificity of a protein is linked to its structure. A growing section of bioinformatics deals with the prediction and visualization of protein 3D structures. In homology modelling, a protein sequence with an unknown structure is aligned with sequences of known protein structures. By exploiting structural information from the known configurations, the new structure can be predicted. In this introductory paper, we will present the principles of homology modelling and demonstrate the method used, by determining the structure of the enzyme glutamic decarboxylase (GAD 65). This protein is an autoantigen involved in several human autoimmune diseases. We will illustrate the different steps in structure prediction of GAD 65 by use of two experimentally determined structures of pig kidney DOPA decarboxylase (one structure in complex with the inhibitor carbidopa) as templates. The resulting model of GAD 65 provides detailed information about the active site of the protein and selected epitopes. By analysis of the interactions between the DOPA decarboxylase with the inhibitor carbidopa, the residues of the GAD 65 active site can be identified via the sequence alignment between DOPA and GAD 65. The locations of known epitopes in the molecule are visualized in special representations giving insights into mechanisms of antigenicity. Hydrophobicity analysis gives first hints for the adherence ability of GAD 65 to the cell membrane. Homology modelling is at present one of the most efficient techniques to provide accurate structural models of proteins. It is expected that in few years, for every new determined protein sequence, at least one member with a known structure of the same protein family will be available, which will steadily increase the importance and applicability of homology modelling.


2004 ◽  
Vol 02 (03) ◽  
pp. 471-495 ◽  
Author(s):  
LUIGI PALOPOLI ◽  
GIORGIO TERRACINA

Predicting the three-dimensional structure of proteins is a difficult task. In the last few years several approaches have been proposed for performing this task taking into account different protein chemical and physical properties. As a result, a growing number of protein structure prediction tools is becoming available, some of them specialized to work on either some aspects of the predictions or on some categories of proteins; however, they are still not sufficiently accurate and reliable for predicting all kinds of proteins. In this context, it is useful to jointly apply different prediction tools and combine their results in order to improve the quality of the predictions. However, several problems have to be solved in order to make this a viable possibility. In this paper a framework and a tool is proposed which allows: (i) definition of a common reference applicative domain for different prediction tools; (ii) characterization of prediction tools through evaluating some quality parameters; (iii) characterization of the performances of a team of predictors jointly applied over a prediction problem; (iv) the singling out of the best team for a prediction problem; and (v) the integration of predictor results in the team in order to obtain a unique prediction. A system implementing the various steps of the proposed framework (CooPPS) has been developed and several experiments for testing the effectiveness of the proposed approach have been carried out.


2021 ◽  
Vol 12 (3) ◽  
pp. 3259-3304

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that transmitted from animal to human became a life-threatening pandemic in 2020. Scientists are currently testing several drugs to eradicate the COVID-19 outbreak. However, there is no 100 % effective drug or vaccine against SARS-CoV-2 has been discovered so far. In this study, we explored the structure prediction and functional analysis of 75 Malaysia SARS-CoV-2 strain’s structural and accessory proteins without the presence of experimental models. Physiochemical analysis, secondary structure analysis, structure prediction, functional characterization, active site identification, and evolutionary analysis based on the amino acid sequences retrieved from National Centre for Biotechnology Information (NCBI). Three-dimensional (3-D) protein structures were built using the Swiss model. The quality of protein models was verified by ERRAT, PROCHECK, and Verify 3D tools. Active prediction analysis revealed the high potential active sites of proteins where the anti-viral drug or vaccine may bind and inhibit the viral activities. Molecular phylogenetic analysis of ORF10, ORF8, and ORF6 proteins from five different species was analyzed. The results from this analysis proved that Homo sapiens SARS-CoV-2 had high genetic similarity with the bat coronavirus. These analyses may help in designing structure-based anti-viral drugs or to develop potential vaccines for SARS-CoV-2.


Sign in / Sign up

Export Citation Format

Share Document