scholarly journals Fold recognition by scoring protein map similarities using the congruence coefficient

2020 ◽  
Author(s):  
Pietro Di Lena ◽  
Pierre Baldi

AbstractMotivationProtein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologues. Recent progress in residue-residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps.ResultsHere we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent CASP editions and over 27,000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison.AvailabilityThe software CCpro is available as part of the Scratch suite http://scratch.proteomics.ics.uci.edu/

Author(s):  
Pietro Di Lena ◽  
Pierre Baldi

Abstract Motivation Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue–residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. Results Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. Availability and implementation The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Kyungyong Seong ◽  
Ksenia V Krasileva

AbstractMagnaporthe oryzae relies on a diverse collection of secreted effector proteins to reprogram the host metabolic and immune responses for the pathogen’s benefit. Characterization of the effectors is thus critical for understanding the biology and host infection mechanisms of this phytopathogen. In rapid, divergent effector evolution, structural information has the potential to illuminate the unknown aspects of effectors that sequence analyses alone cannot reveal. It has recently become feasible to reliably predict the protein structures without depending on homologous templates. In this study, we tested structure modeling on 1854 secreted proteins from M. oryzae and evaluated success and obstacles involved in effector structure prediction. With sensitive homology search and structure-based clustering, we defined both distantly related homologous groups and structurally related analogous groups. With this dataset, we propose sequence-unrelated, structurally similar effectors are a common theme in M. oryzae and possibly in other phytopathogens. We incorporated the predicted models for structure-based annotations, molecular docking and evolutionary analyses to demonstrate how the predicted structures can deepen our understanding of effector biology. We also provide new experimentally testable structure-derived hypotheses of effector functions. Collectively, we propose that computational structural genomic approaches can now be an integral part of studying effector biology and provide valuable resources that were inaccessible before the advent of reliable, machine learning-based structure prediction.


2014 ◽  
Vol 11 (95) ◽  
pp. 20131147 ◽  
Author(s):  
Agnel Praveen Joseph ◽  
Alexandre G. de Brevern

Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.


2021 ◽  
Author(s):  
Gabriele Pozzati ◽  
Wensi Zhu ◽  
John Lamb ◽  
Claudio Bassot ◽  
Petras Kundrotas ◽  
...  

In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilizing deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSA). In CASP14, the best method could predict the structure of most proteins with impressive accuracy. The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most of the earlier studies have not used the latest DL methods for inter-chain contact distance predictions. In this paper, we showed for the first time that using one of the best DL-based residue-residue contact prediction methods (trRosetta), it is possible to simultaneously predict both the tertiary and quaternary structures of some protein pairs, even when the structures of the monomers are not known. Straightforward application of this method to a standard dataset for protein-protein docking yielded limited success, however, using alternative methods for MSA generating allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins and thus this function can be used to evaluate the quality of the resulting docking models. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods, however, no a priori structural information for the individual proteins is needed. Moreover, the results of traditional and fold-and-dock approaches are complementary and thus a combined docking pipeline should increase overall docking success significantly. The dock-and-fold pipeline helped us to generate the best model for one of the CASP14 oligomeric targets, H1065.


2019 ◽  
Vol 16 (2) ◽  
pp. 159-172 ◽  
Author(s):  
Elaheh Kashani-Amin ◽  
Ozra Tabatabaei-Malazy ◽  
Amirhossein Sakhteman ◽  
Bagher Larijani ◽  
Azadeh Ebrahim-Habibi

Background: Prediction of proteins’ secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple Secondary Structure Prediction (SSP) options is challenging. The current study is an insight into currently favored methods and tools, within various contexts. Objective: A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Methods: Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of the 209 studies were finally found eligible to extract data. Results: Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating an SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. Conclusion: This study provides a comprehensive insight into the recent usage of SSP tools which could be helpful for selecting a proper tool.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


Crystals ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1472
Author(s):  
Sergey V. Krivovichev

Modularity is an important construction principle of many inorganic crystal structures that has been used for the analysis of structural relations, classification, structure description and structure prediction. The principle of maximal simplicity for modular inorganic crystal structures can be formulated as follows: in a modular series of inorganic crystal structures, the most common and abundant in nature and experiments are those arrangements that possess maximal simplicity and minimal structural information. The latter can be quantitatively estimated using information-based structural complexity parameters. The principle is applied for the modular series based upon 0D (lovozerite family), 1D (biopyriboles) and 2D (spinelloids and kurchatovite family) modules. This principle is empirical and is valid for those cases only, where there are no factors that may lead to the destabilization of simplest structural arrangements. The physical basis of the principle is in the relations between structural complexity and configurational entropy sensu stricto (which should be distinguished from the entropy of mixing). It can also be seen as an analogy of the principle of least action in physics.


2019 ◽  
Vol 77 (1) ◽  
pp. 3-18 ◽  
Author(s):  
Yueru Sun ◽  
Thomas J. McCorvie ◽  
Luke A. Yates ◽  
Xiaodong Zhang

AbstractHomologous recombination (HR) is a pathway to faithfully repair DNA double-strand breaks (DSBs). At the core of this pathway is a DNA recombinase, which, as a nucleoprotein filament on ssDNA, pairs with homologous DNA as a template to repair the damaged site. In eukaryotes Rad51 is the recombinase capable of carrying out essential steps including strand invasion, homology search on the sister chromatid and strand exchange. Importantly, a tightly regulated process involving many protein factors has evolved to ensure proper localisation of this DNA repair machinery and its correct timing within the cell cycle. Dysregulation of any of the proteins involved can result in unchecked DNA damage, leading to uncontrolled cell division and cancer. Indeed, many are tumour suppressors and are key targets in the development of new cancer therapies. Over the past 40 years, our structural and mechanistic understanding of homologous recombination has steadily increased with notable recent advancements due to the advances in single particle cryo electron microscopy. These have resulted in higher resolution structural models of the signalling proteins ATM (ataxia telangiectasia mutated), and ATR (ataxia telangiectasia and Rad3-related protein), along with various structures of Rad51. However, structural information of the other major players involved, such as BRCA1 (breast cancer type 1 susceptibility protein) and BRCA2 (breast cancer type 2 susceptibility protein), has been limited to crystal structures of isolated domains and low-resolution electron microscopy reconstructions of the full-length proteins. Here we summarise the current structural understanding of homologous recombination, focusing on key proteins in recruitment and signalling events as well as the mediators for the Rad51 recombinase.


Sign in / Sign up

Export Citation Format

Share Document