scholarly journals Amalgamation of 3D structure and sequence information for protein–protein interaction prediction

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Kanchan Jha ◽  
Sriparna Saha

Abstract Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein–protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein–protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.

2019 ◽  
Vol 20 (4) ◽  
pp. 978 ◽  
Author(s):  
Zhao-Hui Zhan ◽  
Li-Na Jia ◽  
Yong Zhou ◽  
Li-Ping Li ◽  
Hai-Cheng Yi

The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to propose an innovative and practical approach with convincing performance of prediction accuracy. In this study, based on the protein sequences from a biological perspective, we put forward an effective deep learning method, named BGFE, to predict ncRNA and protein interactions. Protein sequences are represented by bi-gram probability feature extraction method from Position Specific Scoring Matrix (PSSM), and for ncRNA sequences, k-mers sparse matrices are employed to represent them. Furthermore, to extract hidden high-level feature information, a stacked auto-encoder network is employed with the stacked ensemble integration strategy. We evaluate the performance of the proposed method by using three datasets and a five-fold cross-validation after classifying the features through the random forest classifier. The experimental results clearly demonstrate the effectiveness and the prediction accuracy of our approach. In general, the proposed method is helpful for ncRNA and protein interacting predictions and it provides some serviceable guidance in future biological research.


2017 ◽  
Author(s):  
Mohammad Nauman ◽  
Hafeez Ur Rehman ◽  
Gianfranco Politano ◽  
Alfredo Benso

ABSTRACTAccurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how this same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.


2020 ◽  
Author(s):  
Mayank Baranwal ◽  
Abram Magner ◽  
Jacob Saldinger ◽  
Emine S. Turali-Emre ◽  
Shivani Kozarekar ◽  
...  

AbstractDevelopment of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. In this study, we address this problem and describe a PPI analysis method based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a five-fold cross validation average accuracy of 99.42%. Moreover, unsupervised prediction of the interaction sites by Struct2Graph for phenol-soluble modulins are found to be in concordance with the previously reported binding sites for this family.Author summaryPPIs are the central part of signal transduction, metabolic regulation, environmental sensing, and cellular organization. Despite their success, most strategies to decode PPIs use sequence based approaches do not generalize to broader classes of chemical compounds of similar scale as proteins that are equally capable of forming complexes with proteins that are not based on amino acids, and thus lack of an equivalent sequence-based representation. Here, we address the problem of prediction of PPIs using a first of its kind, 3D structure based graph attention network (available at https://github.com/baranwa2/Struct2Graph). Despite its excellent prediction performance, the novel mutual attention mechanism provides insights into likely interaction sites through its knowledge selection process in a completely unsupervised manner.


Molecules ◽  
2020 ◽  
Vol 25 (8) ◽  
pp. 1841 ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
Rui Gao

Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.


Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 558 ◽  
Author(s):  
Elena Lenci ◽  
Andrea Trabocchi

Natural and nonnatural amino acids represent important building blocks for the development of peptidomimetic scaffolds, especially for targeting proteolytic enzymes and for addressing protein–protein interactions. Among all the different amino acids derivatives, proline is particularly relevant in chemical biology and medicinal chemistry due to its secondary structure’s inducing and stabilizing properties. Also, the pyrrolidine ring is a conformationally constrained template that can direct appendages into specific clefts of the enzyme binding site. Thus, many papers have appeared in the literature focusing on the use of proline and its derivatives as scaffolds for medicinal chemistry applications. In this review paper, an insight into the different biological outcomes of d-proline and l-proline in enzyme inhibitors is presented, especially when associated with matrix metalloprotease and metallo-β-lactamase enzymes.


Life ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1171
Author(s):  
Stefano Rosa ◽  
Chiara Bertaso ◽  
Paolo Pesaresi ◽  
Simona Masiero ◽  
Andrea Tagliani

Protein-protein interactions (PPIs) contribute to regulate many aspects of cell physiology and metabolism. Protein domains involved in PPIs are important building blocks for engineering genetic circuits through synthetic biology. These domains can be obtained from known proteins and rationally engineered to produce orthogonal scaffolds, or computationally designed de novo thanks to recent advances in structural biology and molecular dynamics prediction. Such circuits based on PPIs (or protein circuits) appear of particular interest, as they can directly affect transcriptional outputs, as well as induce behavioral/adaptational changes in cell metabolism, without the need for further protein synthesis. This last example was highlighted in recent works to enable the production of fast-responding circuits which can be exploited for biosensing and diagnostics. Notably, PPIs can also be engineered to develop new drugs able to bind specific intra- and extra-cellular targets. In this review, we summarize recent findings in the field of protein circuit design, with particular focus on the use of peptides as scaffolds to engineer these circuits.


2019 ◽  
Author(s):  
Yi Guo ◽  
Xiang Chen

AbstractMotivationAlmost all critical functions and processes in cells are sustained by the cellular networks of protein-protein interactions (PPIs), understanding these is therefore crucial in the investigation of biological systems. Despite all past efforts, we still lack high-quality PPI data for constructing the networks, which makes it challenging to study the functions of association of proteins. High-throughput experimental techniques have produced abundant data for systematically studying the cellular networks of a biological system and the development of computational method for PPI identification.ResultsWe have developed a deep learning-based framework, named iPPI, for accurately predicting PPI on a proteome-wide scale depended only on sequence information. iPPI integrates the amino acid properties and compositions of protein sequence into a unified prediction framework using a hybrid deep neural network. Extensive tests demonstrated that iPPI can greatly outperform the state-of-the-art prediction methods in identifying PPIs. In addition, the iPPI prediction score can be related to the strength of protein-protein binding affinity and further showed the biological relevance of our deep learning framework to identify PPIs.Availability and ImplementationiPPI is available as an open-source software and can be downloaded from https://github.com/model-lab/[email protected]


Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1090 ◽  
Author(s):  
Edwin Rodriguez Horta ◽  
Pierre Barrat-Charlaix ◽  
Martin Weigt

Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.


2020 ◽  
Vol 27 (5) ◽  
pp. 359-369 ◽  
Author(s):  
Cheng Shi ◽  
Jiaxing Chen ◽  
Xinyue Kang ◽  
Guiling Zhao ◽  
Xingzhen Lao ◽  
...  

: Protein-related interaction prediction is critical to understanding life processes, biological functions, and mechanisms of drug action. Experimental methods used to determine proteinrelated interactions have always been costly and inefficient. In recent years, advances in biological and medical technology have provided us with explosive biological and physiological data, and deep learning-based algorithms have shown great promise in extracting features and learning patterns from complex data. At present, deep learning in protein research has emerged. In this review, we provide an introductory overview of the deep neural network theory and its unique properties. Mainly focused on the application of this technology in protein-related interactions prediction over the past five years, including protein-protein interactions prediction, protein-RNA\DNA, Protein– drug interactions prediction, and others. Finally, we discuss some of the challenges that deep learning currently faces.


2020 ◽  
Vol 27 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background: NMR spectroscopy is one of the most powerful tools to study the structure and interaction properties of peptides and proteins from a dynamic perspective. Knowing the bioactive conformations of peptides is crucial in the drug discovery field to design more efficient analogue ligands and inhibitors of protein-protein interactions targeting therapeutically relevant systems. Objective: This review provides a toolkit to investigate peptide conformational properties by NMR. Methods: Articles cited herein, related to NMR studies of peptides and proteins were mainly searched through Pubmed and the web. More recent and old books on NMR spectroscopy written by eminent scientists in the field were consulted as well. Results: The review is mainly focused on NMR tools to gain the 3D structure of small unlabeled peptides. It is more application-oriented as it is beyond its goal to deliver a profound theoretical background. However, the basic principles of 2D homonuclear and heteronuclear experiments are briefly described. Protocols to obtain isotopically labeled peptides and principal triple resonance experiments needed to study them, are discussed as well. Conclusion: NMR is a leading technique in the study of conformational preferences of small flexible peptides whose structure can be often only described by an ensemble of conformations. Although NMR studies of peptides can be easily and fast performed by canonical protocols established a few decades ago, more recently we have assisted to tremendous improvements of NMR spectroscopy to investigate instead large systems and overcome its molecular weight limit.


Sign in / Sign up

Export Citation Format

Share Document