scholarly journals FunFam protein families improve residue level molecular function prediction

2019 ◽  
Author(s):  
Linus Mathias Scheibenreif ◽  
Maria Littmann ◽  
Christine Orengo ◽  
Burkhard Rost

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

2019 ◽  
Author(s):  
Linus Mathias Scheibenreif ◽  
Maria Littmann ◽  
Christine Orengo ◽  
Burkhard Rost

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.


2019 ◽  
Author(s):  
Linus Mathias Scheibenreif ◽  
Maria Littmann ◽  
Christine Orengo ◽  
Burkhard Rost

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.


2017 ◽  
Vol 20 (4) ◽  
pp. 1250-1268 ◽  
Author(s):  
Jian Zhang ◽  
Zhiqiang Ma ◽  
Lukasz Kurgan

Abstract Proteins interact with a variety of molecules including proteins and nucleic acids. We review a comprehensive collection of over 50 studies that analyze and/or predict these interactions. While majority of these studies address either solely protein–DNA or protein–RNA binding, only a few have a wider scope that covers both protein–protein and protein–nucleic acid binding. Our analysis reveals that binding residues are typically characterized with three hallmarks: relative solvent accessibility (RSA), evolutionary conservation and propensity of amino acids (AAs) for binding. Motivated by drawbacks of the prior studies, we perform a large-scale analysis to quantify and contrast the three hallmarks for residues that bind DNA-, RNA-, protein- and (for the first time) multi-ligand-binding residues that interact with DNA and proteins, and with RNA and proteins. Results generated on a well-annotated data set of over 23 000 proteins show that conservation of binding residues is higher for nucleic acid- than protein-binding residues. Multi-ligand-binding residues are more conserved and have higher RSA than single-ligand-binding residues. We empirically show that each hallmark discriminates between binding and nonbinding residues, even predicted RSA, and that combining them improves discriminatory power for each of the five types of interactions. Linear scoring functions that combine these hallmarks offer good predictive performance of residue-level propensity for binding and provide intuitive interpretation of predictions. Better understanding of these residue-level interactions will facilitate development of methods that accurately predict binding in the exponentially growing databases of protein sequences.


2019 ◽  
Vol 7 (1) ◽  
pp. 48-63
Author(s):  
Hee Rhang Yoon ◽  
Aaztli Coria ◽  
Alain Laederach ◽  
Christine Heitsch

AbstractA riboswitch is a type of RNA molecule that regulates important biological functions by changing structure, typically under ligand-binding. We assess the extent that these ligand-bound structural alternatives are present in the Boltzmann sample, a standard RNA secondary structure prediction method, for three riboswitch test cases. We use the cluster analysis tool RNAStructProfiling to characterize the different modalities present among the suboptimal structures sampled. We compare these modalities to the putative base pairing models obtained from independent experiments using NMR or fluorescence spectroscopy. We find, somewhat unexpectedly, that profiling the Boltzmann sample captures evidence of ligand-bound conformations for two of three riboswitches studied. Moreover, this agreement between predicted modalities and experimental models is consistent with the classification of riboswitches into thermodynamic versus kinetic regulatory mechanisms. Our results support cluster analysis of Boltzmann samples by RNAStructProfiling as a possible basis for de novo identification of thermodynamic riboswitches, while highlighting the challenges for kinetic ones.


2017 ◽  
Author(s):  
Shiran Barber-Zucker ◽  
Boaz Shaanan ◽  
Raz Zarivach

AbstractDivalent d-block metal cations (DDMCs), such as Fe, Zn and Mn, participate in many biological processes. Understanding how specific DDMCs are transported to and within the cell and what controls their binding selectivity to different proteins is crucial for defining the mechanisms of metalloproteins. To better understand such processes, we scanned the RCSB Protein Data Bank, performed a de novo structural-based comprehensive analysis of seven DDMCs and found their amino acid binding and coordination geometry propensities. We then utilized these results to characterize the correlation between metal selectivity, specific binding site composition and phylogenetic classification of the cation diffusion facilitator (CDF) protein family, a family of DDMC transporters found throughout evolution and sharing a conserved structure, yet with different members displaying distinct metal selectivity. Our analysis shows that DDMCs differ, at times significantly, in terms of their binding propensities, and that in each CDF clade, the metal selectivity-related binding site has a unique and conserved sequence signature. However, only limited correlation exists between the composition of the DDMC binding site in each clade and the metal selectivity shown by its proteins.


1987 ◽  
Vol 7 (12) ◽  
pp. 4400-4406 ◽  
Author(s):  
K D Breunig ◽  
P Kuger

As shown previously, the beta-galactosidase gene of Kluyveromyces lactis is transcriptionally regulated via an upstream activation site (UASL) which contains a sequence homologous to the GAL4 protein-binding site in Saccharomyces cerevisiae (M. Ruzzi, K.D. Breunig, A.G. Ficca, and C.P. Hollenberg, Mol. Cell. Biol. 7:991-997, 1987). Here we demonstrate that the region of homology specifically binds a K. lactis regulatory protein. The binding activity was detectable in protein extracts from wild-type cells enriched for DNA-binding proteins by heparin affinity chromatography. These extracts could be used directly for DNase I and exonuclease III protection experiments. A lac9 deletion strain, which fails to induce the beta-galactosidase gene, did not contain the binding factor. The homology of LAC9 protein with GAL4 (J.M. Salmeron and S. A. Johnston, Nucleic Acids Res. 14:7767-7781, 1986) strongly suggests that LAC9 protein binds directly to UASL and plays a role similar to that of GAL4 in regulating transcription.


Sign in / Sign up

Export Citation Format

Share Document