scholarly journals SSIPe: accurately estimating protein–protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function

2019 ◽  
Vol 36 (8) ◽  
pp. 2429-2437 ◽  
Author(s):  
Xiaoqiang Huang ◽  
Wei Zheng ◽  
Robin Pearce ◽  
Yang Zhang

Abstract Motivation Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. Results We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementation Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Yanhui Hu ◽  
Richelle Sopko ◽  
Verena Chung ◽  
Romain A. Studer ◽  
Sean D. Landry ◽  
...  

AbstractPost-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing stability, protein interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, commonly serine, threonine and tyrosine. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that many phosphorylation sites may be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites with regards to regulation and function. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from Drosophila embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for Drosophila. At iProteinDB, scientists can view the PTM landscape for any Drosophila protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related Drosophila species. Further, iProteinDB enables comparison of PTM data from Drosophila to that of orthologous proteins from other model organisms, including human, mouse, rat, Xenopus laevis, Danio rerio, and Caenorhabditis elegans.


2005 ◽  
Vol 33 (3) ◽  
pp. 530-534 ◽  
Author(s):  
M. Lappe ◽  
L. Holm

The functional characterization of all genes and their gene products is the main challenge of the postgenomic era. Recent experimental and computational techniques have enabled the study of interactions among all proteins on a large scale. In this paper, approaches will be presented to exploit interaction information for the inference of protein structure, function, signalling pathways and ultimately entire interactomes. Interaction networks can be modelled as graphs, showing the operation of gene function in terms of protein interactions. Since the architecture of biological networks differs distinctly from random networks, these functional maps contain a signal that can be used for predictive purposes. Protein function and structure can be predicted by matching interaction patterns, without the requirement of sequence similarity. Moving on to a higher level definition of protein function, the question arises how to decompose complex networks into meaningful subsets. An algorithm will be demonstrated, which extracts whole signal-transduction pathways from noisy graphs derived from text-mining the biological literature. Finally, an algorithmic strategy is formulated that enables the proteomics community to build a reliable scaffold of the interactome in a fraction of the time compared with uncoordinated efforts.


2018 ◽  
Author(s):  
Curtis J Layton ◽  
Peter L McMahon ◽  
William J Greenleaf

SummaryHigh-throughput DNA sequencing techniques have enabled diverse approaches for linking DNA sequence to biochemical function. In contrast, assays of protein function have substantial limitations in terms of throughput, automation, and widespread availability. We have adapted an Illumina high-throughput sequencing chip to display an immense diversity of ribosomally-translated proteins and peptides, and then carried out fluorescence-based functional assays directly on this flow cell, demonstrating that a single, widely-available high-throughput platform can perform both sequencing-by-synthesis and protein assays. We quantified the binding of the M2 anti-FLAG antibody to a library of 1.3×104 variant FLAG peptides, exploring non-additive effects of combinations of mutations and discovering a “superFLAG” epitope variant. We also measured the enzymatic activity of 1.56×105 molecular variants of full-length of human O6-alkylguanine-DNA alkyltransferase (SNAP-tag). This comprehensive corpus of catalytic rates linked to amino acid sequence perturbations revealed amino acid interaction networks and cooperativity, linked positive cooperativity to structural proximity, and revealed ubiquitous positively-cooperative interactions with histidine residues.


2020 ◽  
Author(s):  
Kaitlyn Bacon ◽  
Abigail Blain ◽  
John Bowen ◽  
Matthew Burroughs ◽  
Nikki McArthur ◽  
...  

AbstractQuantifying the binding affinity of protein-protein interactions is important for elucidating connections within biochemical signaling pathways, as well as characterization of binding proteins isolated from combinatorial libraries. We describe a quantitative yeast-yeast two hybrid (qYY2H) system that not only enables discovery of specific protein-protein interactions, but also efficient, quantitative estimation of their binding affinities (KD). In qYY2H, the bait and prey proteins are expressed as yeast cell surface fusions using yeast surface display. We developed a semi-empirical framework for estimating the KD of monovalent bait-prey interactions, using measurements of the apparent KD of yeast-yeast binding, which is mediated by multivalent interactions between yeast-displayed bait and prey. Using qYY2H, we identified interaction partners of SMAD3 and the tandem WW domains of YAP from a cDNA library and characterized their binding affinities. Finally, we showed that qYY2H could also quantitatively evaluate binding interactions mediated by post-translational modifications on the bait protein.


2020 ◽  
Vol 36 (16) ◽  
pp. 4383-4388 ◽  
Author(s):  
Xiaoqiong Wei ◽  
Chengxin Zhang ◽  
Peter L Freddolino ◽  
Yang Zhang

Abstract Motivation Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. Results We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are ‘Inferred from Biological aspect of Ancestor (IBA)’ which is in contradiction with previous observations that attributed misannotations mainly to ‘Inferred from Sequence or structural Similarity (ISS)’, probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies. Availability and implementation https://zhanglab.ccmb.med.umich.edu/RAR. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jianzhao Gao ◽  
Shuangjia Zheng ◽  
Mengting Yao ◽  
Peikun Wu

Abstract Motivation The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. Results In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921–0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. Availabilityand implementation The method is free available at https://github.com/cliffgao/EAGERER. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3357-3364 ◽  
Author(s):  
Tyler C Shimko ◽  
Polly M Fordyce ◽  
Yaron Orenstein

Abstract Motivation High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. Results We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. Availability and implementation github.com/OrensteinLab/DeCoDe. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2004 ◽  
Vol 24 (8) ◽  
pp. 3157-3167 ◽  
Author(s):  
Thierry Cheutin ◽  
Stanislaw A. Gorski ◽  
Karen M. May ◽  
Prim B. Singh ◽  
Tom Misteli

ABSTRACT The mechanism for transcriptional silencing of pericentric heterochromatin is conserved from fission yeast to mammals. Silenced genome regions are marked by epigenetic methylation of histone H3, which serves as a binding site for structural heterochromatin proteins. In the fission yeast Schizosaccharomyces pombe, the major structural heterochromatin protein is Swi6. To gain insight into Swi6 function in vivo, we have studied its dynamics in the nucleus of living yeast. We demonstrate that, in contrast to mammalian cells, yeast heterochromatin domains undergo rapid, large-scale motions within the nucleus. Similar to the situation in mammalian cells, Swi6 does not permanently associate with these chromatin domains but binds only transiently to euchromatin and heterochromatin. Swi6 binding dynamics are dependent on growth status and on the silencing factors Clr4 and Rik1, but not Clr1, Clr2, or Clr3. By comparing the kinetics of mutant Swi6 proteins in swi6− and swi6+ strains, we demonstrate that homotypic protein-protein interactions via the chromoshadow domain stabilize Swi6 binding to chromatin in vivo. Kinetic modeling allowed quantitative estimation of residence times and indicated the existence of at least two kinetically distinct populations of Swi6 in heterochromatin. The observed dynamics of Swi6 binding are consistent with a stochastic model of heterochromatin and indicate evolutionary conservation of heterochromatin protein binding properties from mammals to yeast.


2010 ◽  
Vol 107 (5) ◽  
pp. 1995-2000 ◽  
Author(s):  
Antonio Rausell ◽  
David Juan ◽  
Florencio Pazos ◽  
Alfonso Valencia

The divergence accumulated during the evolution of protein families translates into their internal organization as subfamilies, and it is directly reflected in the characteristic patterns of differentially conserved residues. These specifically conserved positions in protein subfamilies are known as “specificity determining positions” (SDPs). Previous studies have limited their analysis to the study of the relationship between these positions and ligand-binding specificity, demonstrating significant yet limited predictive capacity. We have systematically extended this observation to include the role of differential protein interactions in the segregation of protein subfamilies and explored in detail the structural distribution of SDPs at protein interfaces. Our results show the extensive influence of protein interactions in the evolution of protein families and the widespread association of SDPs with protein interfaces. The combined analysis of SDPs in interfaces and ligand-binding sites provides a more complete picture of the organization of protein families, constituting the necessary framework for a large scale analysis of the evolution of protein function.


Sign in / Sign up

Export Citation Format

Share Document