scholarly journals Protein features identification for machine learning-based prediction of protein-protein interactions

2017 ◽  
Author(s):  
Khalid Raza

AbstractThe long awaited challenge of post-genomic era and systems biology research is computational prediction of protein-protein interactions (PPIs) that ultimately lead to protein functions prediction. The important research questions is how protein complexes with known sequence and structure be used to identify and classify protein binding sites, and how to infer knowledge from these classification such as predicting PPIs of proteins with unknown sequence and structure. Several machine learning techniques have been applied for the prediction of PPIs, but the accuracy of their prediction wholly depends on the number of features being used for training. In this paper, we have performed a survey of protein features used for the prediction of PPIs. The open research challenges and opportunities in the area have also been discussed.

Author(s):  
Byung-Hoon Park ◽  
Phuongan Dam ◽  
Chongle Pan ◽  
Ying Xu ◽  
Al Geist ◽  
...  

Protein-protein interactions are fundamental to cellular processes. They are responsible for phenomena like DNA replication, gene transcription, protein translation, regulation of metabolic pathways, immunologic recognition, signal transduction, etc. The identification of interacting proteins is therefore an important prerequisite step in understanding their physiological functions. Due to the invaluable importance to various biophysical activities, reliable computational methods to infer protein-protein interactions from either structural or genome sequences are in heavy demand lately. Successful predictions, for instance, will facilitate a drug design process and the reconstruction of metabolic or regulatory networks. In this chapter, we review: (a) high-throughput experimental methods for identification of protein-protein interactions, (b) existing databases of protein-protein interactions, (c) computational approaches to predicting protein-protein interactions at both residue and protein levels, (d) various statistical and machine learning techniques to model protein-protein interactions, and (e) applications of protein-protein interactions in predicting protein functions. We also discuss intrinsic drawbacks of the existing approaches and future research directions.


2019 ◽  
Vol 26 (8) ◽  
pp. 601-619 ◽  
Author(s):  
Amit Sagar ◽  
Bin Xue

The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.


2018 ◽  
Vol 2 (3) ◽  
pp. 228-267 ◽  
Author(s):  
Zaidi ◽  
Chandola ◽  
Allen ◽  
Sanyal ◽  
Stewart ◽  
...  

Modeling the interactions of water and energy systems is important to the enforcement of infrastructure security and system sustainability. To this end, recent technological advancement has allowed the production of large volumes of data associated with functioning of these sectors. We are beginning to see that statistical and machine learning techniques can help elucidate characteristic patterns across these systems from water availability, transport, and use to energy generation, fuel supply, and customer demand, and in the interdependencies among these systems that can leave these systems vulnerable to cascading impacts from single disruptions. In this paper, we discuss ways in which data and machine learning can be applied to the challenges facing the energy-water nexus along with the potential issues associated with the machine learning techniques themselves. We then survey machine learning techniques that have found application to date in energy-water nexus problems. We conclude by outlining future research directions and opportunities for collaboration among the energy-water nexus and machine learning communities that can lead to mutual synergistic advantage.


2007 ◽  
Vol 4 (3) ◽  
pp. 208-223 ◽  
Author(s):  
José A. Reyes ◽  
David Gilbert

Summary This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results.Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature.


Author(s):  
Piyali Chatterjee ◽  
Subhadip Basu ◽  
Mahantapas Kundu ◽  
Mita Nasipuri ◽  
Dariusz Plewczynski

AbstractProtein-protein interactions (PPI) control most of the biological processes in a living cell. In order to fully understand protein functions, a knowledge of protein-protein interactions is necessary. Prediction of PPI is challenging, especially when the three-dimensional structure of interacting partners is not known. Recently, a novel prediction method was proposed by exploiting physical interactions of constituent domains. We propose here a novel knowledge-based prediction method, namely PPI_SVM, which predicts interactions between two protein sequences by exploiting their domain information. We trained a two-class support vector machine on the benchmarking set of pairs of interacting proteins extracted from the Database of Interacting Proteins (DIP). The method considers all possible combinations of constituent domains between two protein sequences, unlike most of the existing approaches. Moreover, it deals with both single-domain proteins and multi domain proteins; therefore it can be applied to the whole proteome in high-throughput studies. Our machine learning classifier, following a brainstorming approach, achieves accuracy of 86%, with specificity of 95%, and sensitivity of 75%, which are better results than most previous methods that sacrifice recall values in order to boost the overall precision. Our method has on average better sensitivity combined with good selectivity on the benchmarking dataset. The PPI_SVM source code, train/test datasets and supplementary files are available freely in the public domain at: http://code.google.com/p/cmater-bioinfo/.


2020 ◽  
Author(s):  
Chen Xu ◽  
Bing Wang ◽  
Lin Yang ◽  
Lucas Zhongming Hu ◽  
Lanxing Yi ◽  
...  

AbstractSynechocystis sp. PCC 6803 (hereafter: Synechocystis) is a model organism for studying photosynthesis, energy metabolism, and environmental stress. Though known as the first fully sequenced phototrophic organism, Synechocystis still has almost half of its proteome without functional annotations. In this study, we obtained 291 protein complexes, including 24,092 protein-protein interactions (PPIs) among 2062 proteins by using co–fractionation and LC/MS/MS. The additional level of PPIs information not only revealed the roles of photosynthesis in metabolism, cell motility, DNA repair, cell division, and other physiological processes, but also showed how protein functions vary from bacteria to higher plants due to the changed interaction partner. It also allows us to uncover functions of hypothetical proteins, such as Sll0445, Sll0446, S110447 participating in photosynthesis and cell motility, and Sll1334 regulating the expression of fatty acid. Here we presented the most extensive protein interaction data in Synechocystis so far, which might provide critical insights into the fundamental molecular mechanism in Cyanobacterium.


Author(s):  
Varsha D Badal ◽  
Petras J Kundrotas ◽  
Ilya A Vakser

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Matthew C. Smith ◽  
Jason E. Gestwicki

Protein–protein interactions (PPIs) control the assembly of multi-protein complexes and, thus, these contacts have enormous potential as drug targets. However, the field has produced a mix of both exciting success stories and frustrating challenges. Here, we review known examples and explore how the physical features of a PPI, such as its affinity, hotspots, off-rates, buried surface area and topology, might influence the chances of success in finding inhibitors. This analysis suggests that concise, tight binding PPIs are most amenable to inhibition. However, it is also clear that emerging technical methods are expanding the repertoire of ‘druggable’ protein contacts and increasing the odds against difficult targets. In particular, natural product-like compound libraries, high throughput screens specifically designed for PPIs and approaches that favour discovery of allosteric inhibitors appear to be attractive routes. The first group of PPI inhibitors has entered clinical trials, further motivating the need to understand the challenges and opportunities in pursuing these types of targets.


Sign in / Sign up

Export Citation Format

Share Document