Prediction of protein-protein interactions using one-class classification methods and integrating diverse biological data

Summary This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results.Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature.

Download Full-text

Machine-learning techniques for the prediction of protein–protein interactions

Journal of Biosciences ◽

10.1007/s12038-019-9909-z ◽

2019 ◽

Vol 44 (4) ◽

Cited By ~ 5

Author(s):

Debasree Sarkar ◽

Sudipto Saha

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Machine Learning Techniques ◽

Protein Protein Interactions ◽

Learning Techniques

Download Full-text

Combining One-Class Classification Models Based on Diverse Biological Data for Prediction of Protein-Protein Interactions

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-540-69828-9_18 ◽

2008 ◽

pp. 177-191 ◽

Cited By ~ 1

Author(s):

José A. Reyes ◽

David Gilbert

Keyword(s):

Protein Interactions ◽

Biological Data ◽

Classification Models ◽

Protein Protein Interactions ◽

One Class Classification

Download Full-text

Protein features identification for machine learning-based prediction of protein-protein interactions

10.1101/137257 ◽

2017 ◽

Cited By ~ 1

Author(s):

Khalid Raza

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Protein Complexes ◽

Computational Prediction ◽

Machine Learning Techniques ◽

Protein Protein Interactions ◽

Learning Techniques ◽

Challenges And Opportunities ◽

Protein Functions ◽

Unknown Sequence

AbstractThe long awaited challenge of post-genomic era and systems biology research is computational prediction of protein-protein interactions (PPIs) that ultimately lead to protein functions prediction. The important research questions is how protein complexes with known sequence and structure be used to identify and classify protein binding sites, and how to infer knowledge from these classification such as predicting PPIs of proteins with unknown sequence and structure. Several machine learning techniques have been applied for the prediction of PPIs, but the accuracy of their prediction wholly depends on the number of features being used for training. In this paper, we have performed a survey of protein features used for the prediction of PPIs. The open research challenges and opportunities in the area have also been discussed.

Download Full-text

Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks

Current Gene Therapy ◽

10.2174/1566523219666190917155959 ◽

2019 ◽

Vol 19 (4) ◽

pp. 232-241 ◽

Cited By ~ 5

Author(s):

Xuegong Chen ◽

Wanwan Shi ◽

Lei Deng

Keyword(s):

Protein Interactions ◽

Experimental Studies ◽

Treatment Strategies ◽

Computational Method ◽

Biological Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Efficient Treatment ◽

Disease Associations ◽

Previous State

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.

Download Full-text

The rise and fall of machine learning methods in biomedical research

F1000Research ◽

10.12688/f1000research.13016.1 ◽

2017 ◽

Vol 6 ◽

pp. 2012 ◽

Cited By ~ 6

Author(s):

Hashem Koohy

Keyword(s):

Machine Learning ◽

Biomedical Research ◽

Life Sciences ◽

Biological Data ◽

Research Note ◽

Machine Learning Techniques ◽

Learning Methods ◽

The Past ◽

Machine Learning Methods ◽

Learning Techniques

In the era of explosion in biological data, machine learning techniques are becoming more popular in life sciences, including biology and medicine. This research note examines the rise and fall of the most commonly used machine learning techniques in life sciences over the past three decades.

Download Full-text

Implanted Knee Kinematics Prediction: comparative performance analysis of machine learning techniques

2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) ◽

10.1109/iciev.2018.8640999 ◽

2018 ◽

Author(s):

Belayat Hossain ◽

Takatoshi Morooka ◽

Makiko Okuno ◽

Manabu Nii ◽

Shinichi Yoshiya ◽

...

Keyword(s):

Machine Learning ◽

Performance Analysis ◽

Knee Kinematics ◽

Machine Learning Techniques ◽

Comparative Performance ◽

Learning Techniques

Download Full-text

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Current Drug Metabolism ◽

10.2174/1389200219666180829121038 ◽

2019 ◽

Vol 20 (3) ◽

pp. 177-184 ◽

Cited By ~ 16

Author(s):

Nantao Zheng ◽

Kairou Wang ◽

Weihua Zhan ◽

Lei Deng

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Protein Interactions ◽

Prediction Models ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Host Protein ◽

Protein Protein Interactions ◽

Protein Motifs

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.

Download Full-text

In Silico Recognition of Protein-Protein Interaction

Advanced Data Mining Technologies in Bioinformatics ◽

10.4018/978-1-59140-863-5.ch013 ◽

2011 ◽

pp. 248-268

Author(s):

Byung-Hoon Park ◽

Phuongan Dam ◽

Chongle Pan ◽

Ying Xu ◽

Al Geist ◽

...

Keyword(s):

Protein Interactions ◽

Regulatory Networks ◽

Machine Learning Techniques ◽

Translation Regulation ◽

Future Research ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Protein Levels ◽

Protein Functions ◽

Model Protein

Protein-protein interactions are fundamental to cellular processes. They are responsible for phenomena like DNA replication, gene transcription, protein translation, regulation of metabolic pathways, immunologic recognition, signal transduction, etc. The identification of interacting proteins is therefore an important prerequisite step in understanding their physiological functions. Due to the invaluable importance to various biophysical activities, reliable computational methods to infer protein-protein interactions from either structural or genome sequences are in heavy demand lately. Successful predictions, for instance, will facilitate a drug design process and the reconstruction of metabolic or regulatory networks. In this chapter, we review: (a) high-throughput experimental methods for identification of protein-protein interactions, (b) existing databases of protein-protein interactions, (c) computational approaches to predicting protein-protein interactions at both residue and protein levels, (d) various statistical and machine learning techniques to model protein-protein interactions, and (e) applications of protein-protein interactions in predicting protein functions. We also discuss intrinsic drawbacks of the existing approaches and future research directions.

Download Full-text

Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions

Protein and Peptide Letters ◽

10.2174/0929866526666190619103853 ◽

2019 ◽

Vol 26 (8) ◽

pp. 601-619 ◽

Cited By ~ 1

Author(s):

Amit Sagar ◽

Bin Xue

Keyword(s):

Machine Learning ◽

Protein Interaction ◽

Protein Interactions ◽

Experimental Methods ◽

Machine Learning Techniques ◽

Computational Techniques ◽

Biological Processes ◽

Complete Spectrum ◽

Future Developments ◽

Learning Techniques

The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.

Download Full-text

Parasites, proteomes and systems: has Descartes’ clock run out of time?

Parasitology ◽

10.1017/s0031182012000716 ◽

2012 ◽

Vol 139 (9) ◽

pp. 1103-1118 ◽

Cited By ~ 18

Author(s):

J. M. WASTLING ◽

S. D. ARMSTRONG ◽

R. KRISHNA ◽

D. XIA

Keyword(s):

Systems Biology ◽

Protein Interactions ◽

Biological Data ◽

Protein Protein Interactions ◽

Proteomics Data ◽

Data Types ◽

Quality Of Data ◽

Post Translational Modifications ◽

Systems Modelling ◽

New Generation

SUMMARYSystems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.

Download Full-text