scholarly journals Mutation effect estimation on protein–protein interactions using deep contextualized representation learning

2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Guangyu Zhou ◽  
Muhao Chen ◽  
Chelsea J T Ju ◽  
Zheng Wang ◽  
Jyun-Yu Jiang ◽  
...  

Abstract The functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein–protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations is commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations on PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR (Mutation Effects in Protein–protein Interaction PRediction Using Contextualized Representations), to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wild-type protein pair and its mutation pair. Multi-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that, with only sequence information, MuPIPR outperforms various state-of-the-art systems on estimating the changes of binding affinity for SKEMPI v1, and offers comparable performance on SKEMPI v2. Meanwhile, MuPIPR also demonstrates state-of-the-art performance on estimating the changes of buried surface areas. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR.

2019 ◽  
Author(s):  
Guangyu Zhou ◽  
Muhao Chen ◽  
Chelsea J.-T. Ju ◽  
Zheng Wang ◽  
Jyun-Yu Jiang ◽  
...  

AbstractThe functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein-protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations are commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations to PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR, to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wildtype protein pair and its mutation pair. Multiple-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that MuPIPR outperforms various state-of-the-art systems on the change of binding affinity prediction and the buried surface area prediction. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR


2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2020 ◽  
Author(s):  
Dhananjay Kimothi ◽  
Pravesh Biyani ◽  
James M. Hogan ◽  
Melissa J. Davis

Abstract Background: Protein-Protein Interactions (PPIs) are a crucial mechanism underpinning the function of the cell. Predicting the likely relationship between a pair of proteins is thus an important problem in bioinformatics, and a wide range of machine-learning based methods have been proposed for this task. Their success is heavily dependent on the construction of the feature vectors, with most using a set of physicochemical properties derived from the sequence. Few work directly with the sequence itself. Recent works on embedding sequences in a low dimensional vector space has shown the utility of this approach for tasks such as protein classification and sequence search. In this paper, we extend these ideas to the PPI prediction task, making inferences from the pair instead of the individual sequences.Methods: We propose a generic PPI prediction framework that constitutes a representation learning module for feature construction and a binary classifier. To construct the feature vector for a protein pair, we concatenate the distributed representations (embeddings) learned for the sequences of the constituent proteins. Each protein pair is represented as a 200-dimensional feature vector. To learn the embedding of a sequence, we use two established methods - Seq2Vec and BioVec, and we also introduce a novel feature construction method and call it SuperVecNW. The embeddings generated through SuperVecNW captures network information to some extent, along with the contextual information present in the sequences. Finally, we feed these feature vectors into a Random forest classifier to predict protein pair interactions.Results: To show the efficacy of our proposed approach, we evaluate its performance on human and yeast PPI datasets, benchmarking against the established methods. Furthermore, we test our approach on three well known networks: the one-core network (CD9), the multiple-core network (Ras-Raf-Mek-Erk-Elk-Srf pathway), and the cross-connection network (Wnt-related network) and demonstrate the improvement in predicting PPIs compared to the other methods.Conclusions: Naive low dimensional sequence embeddings provide better results on protein-protein interaction prediction task than most of the alternative representations based on other physiochemical properties. These methods require computationally modest effort due to their lower dimensionality. Advanced representation learning methods that enrich the sequence embeddings with meta information are expected to improve the results further.


2019 ◽  
Author(s):  
Hassan Kané ◽  
Mohamed Coulibali ◽  
Ali Abdalla ◽  
Pelkins Ajanoh

ABSTRACTComputational methods that infer the function of proteins are key to understanding life at the molecular level. In recent years, representation learning has emerged as a powerful paradigm to discover new patterns among entities as varied as images, words, speech, molecules. In typical representation learning, there is only one source of data or one level of abstraction at which the learned representation occurs. However, proteins can be described by their primary, secondary, tertiary, and quaternary structure or even as nodes in protein-protein interaction networks. Given that protein function is an emergent property of all these levels of interactions in this work, we learn joint representations from both amino acid sequence and multilayer networks representing tissue-specific protein-protein interactions. Using these hybrid representations, we show that simple machine learning models trained using these hybrid representations outperform existing network-based methods on the task of tissue-specific protein function prediction on 13 out of 13 tissues. Furthermore, these representations outperform existing ones by 14% on average.


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


2017 ◽  
Vol 114 (11) ◽  
pp. E2146-E2155 ◽  
Author(s):  
Chi-Yun Lin ◽  
Johan Both ◽  
Keunbong Do ◽  
Steven G. Boxer

Split GFPs have been widely applied for monitoring protein–protein interactions by expressing GFPs as two or more constituent parts linked to separate proteins that only fluoresce on complementing with one another. Although this complementation is typically irreversible, it has been shown previously that light accelerates dissociation of a noncovalently attached β-strand from a circularly permuted split GFP, allowing the interaction to be reversible. Reversible complementation is desirable, but photodissociation has too low of an efficiency (quantum yield <1%) to be useful as an optogenetic tool. Understanding the physical origins of this low efficiency can provide strategies to improve it. We elucidated the mechanism of strand photodissociation by measuring the dependence of its rate on light intensity and point mutations. The results show that strand photodissociation is a two-step process involving light-activated cis-trans isomerization of the chromophore followed by light-independent strand dissociation. The dependence of the rate on temperature was then used to establish a potential energy surface (PES) diagram along the photodissociation reaction coordinate. The resulting energetics–function model reveals the rate-limiting process to be the transition from the electronic excited-state to the ground-state PES accompanying cis-trans isomerization. Comparisons between split GFPs and other photosensory proteins, like photoactive yellow protein and rhodopsin, provide potential strategies for improving the photodissociation quantum yield.


Molecules ◽  
2021 ◽  
Vol 26 (18) ◽  
pp. 5544
Author(s):  
Radha Charan Dash ◽  
Kyle Hadden

Translesion synthesis (TLS) is an error-prone DNA damage tolerance mechanism used by actively replicating cells to copy past DNA lesions and extend the primer strand. TLS ensures that cells continue replication in the presence of damaged DNA bases, albeit at the expense of an increased mutation rate. Recent studies have demonstrated a clear role for TLS in rescuing cancer cells treated with first-line genotoxic agents by allowing them to replicate and survive in the presence of chemotherapy-induced DNA lesions. The importance of TLS in both the initial response to chemotherapy and the long-term development of acquired resistance has allowed it to emerge as an interesting target for small molecule drug discovery. Proper TLS function is a complicated process involving a heteroprotein complex that mediates multiple attachment and switching steps through several protein–protein interactions (PPIs). In this review, we briefly describe the importance of TLS in cancer and provide an in-depth analysis of key TLS PPIs, focusing on key structural features at the PPI interface while also exploring the potential druggability of each key PPI.


2021 ◽  
Vol 67 (3) ◽  
pp. 251-258
Author(s):  
A.E. Kniga ◽  
I.V. Polyakov ◽  
A.V. Nemukhin

Effective personalized immunotherapies of the future will need to capture not only the peculiarities of the patient’s tumor but also of his immune response to it. In this study, using results of in vitro high-throughput specificity assays, and combining comparative models of pMHCs and TCRs using molecular docking, we have constructed all-atom models for the putative complexes of all their possible pairwise TCR-pMHC combinations. For the models obtained we have calculated a dataset of physics-based scores and have trained binary classifiers that perform better compared to their solely sequence-based counterparts. These structure-based classifiers pinpoint the most prominent energetic terms and structural features characterizing the type of protein-protein interactions that underlies the immune recognition of tumors by T cells.


Author(s):  
Pablo Minguez ◽  
Joaquin Dopazo

Here the authors review the state of the art in the use of protein-protein interactions (ppis) within the context of the interpretation of genomic experiments. They report the available resources and methodologies used to create a curated compilation of ppis introducing a novel approach to filter interactions. Special attention is paid in the complexity of the topology of the networks formed by proteins (nodes) and pairwise interactions (edges). These networks can be studied using graph theory and a brief introduction to the characterization of biological networks and definitions of the more used network parameters is also given. Also a report on the available resources to perform different modes of functional profiling using ppi data is provided along with a discussion on the approaches that have typically been applied into this context. They also introduce a novel methodology for the evaluation of networks and some examples of its application.


2020 ◽  
Vol 21 (22) ◽  
pp. 8824
Author(s):  
Veronika Obsilova ◽  
Tomas Obsil

Phosphorylation by kinases governs many key cellular and extracellular processes, such as transcription, cell cycle progression, differentiation, secretion and apoptosis. Unsurprisingly, tight and precise kinase regulation is a prerequisite for normal cell functioning, whereas kinase dysregulation often leads to disease. Moreover, the functions of many kinases are regulated through protein–protein interactions, which in turn are mediated by phosphorylated motifs and often involve associations with the scaffolding and chaperon protein 14-3-3. Therefore, the aim of this review article is to provide an overview of the state of the art on 14-3-3-mediated kinase regulation, focusing on the most recent mechanistic insights into these important protein–protein interactions and discussing in detail both their structural aspects and functional consequences.


Sign in / Sign up

Export Citation Format

Share Document