scholarly journals Predicting Protein–Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization

2013 ◽  
Vol 20 (4) ◽  
pp. 344-358 ◽  
Author(s):  
Hua Wang ◽  
Heng Huang ◽  
Chris Ding ◽  
Feiping Nie
2019 ◽  
Vol 20 (3) ◽  
pp. 177-184 ◽  
Author(s):  
Nantao Zheng ◽  
Kairou Wang ◽  
Weihua Zhan ◽  
Lei Deng

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.


Parasitology ◽  
2012 ◽  
Vol 139 (9) ◽  
pp. 1103-1118 ◽  
Author(s):  
J. M. WASTLING ◽  
S. D. ARMSTRONG ◽  
R. KRISHNA ◽  
D. XIA

SUMMARYSystems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.


Author(s):  
Fatma-Elzahraa Eid ◽  
Haitham Elmarakeby ◽  
Yujia Alina Chan ◽  
Nadine Fornelos Martins ◽  
Mahmoud ElHefnawi ◽  
...  

AbstractRepresentational biases that are common in biological data can inflate prediction performance and confound our understanding of how and what machine learning (ML) models learn from large complicated datasets. However, auditing for these biases is not a common practice in ML in the life sciences. Here, we devise a systematic auditing framework and harness it to audit three different ML applications of significant therapeutic interest: prediction frameworks of protein-protein interactions, drug-target bioactivity, and MHC-peptide binding. Through this, we identify unrecognized biases that hinder the ML process and result in low model generalizability. Ultimately, we show that, when there is insufficient signal in the training data, ML models are likely to learn primarily from representational biases.


2019 ◽  
Vol 18 (32) ◽  
pp. 2800-2815 ◽  
Author(s):  
Nisha Chhokar ◽  
Sourav Kalra ◽  
Monika Chauhan ◽  
Anjana Munshi ◽  
Raj Kumar

The failure of the Integrase Strand Transfer Inhibitors (INSTIs) due to the mutations occurring at the catalytic site of HIV integrase (IN) has led to the design of allosteric integrase inhibitors (ALLINIs). Lens epithelium derived growth factor (LEDGF/p75) is the host cellular cofactor which helps chaining IN to the chromatin. The protein-protein interactions (PPIs) were observed at the allosteric site (LEDGF/p75 binding domain) between LEDGF/p75 of the host cell and IN of virus. In recent years, many small molecules such as CX04328, CHIBA-3053 and CHI-104 have been reported as LEDGF/p75-IN interaction inhibitors (LEDGINs). LEDGINs have emerged as promising therapeutics to halt the PPIs by binding at the interface of both the proteins. In the present work, we correlated the docking scores for the reported LEDGINs containing quinoline scaffold with the in vitro biological data. The hierarchal clustering method was used to divide the compounds into test and training set. The robustness of the generated model was validated by q2 and r2 for the predicted set of compounds. The generated model between the docking score and biological data was assessed to predict the activity of the hits (quinoline scaffold) obtained from virtual screening of LEDGINs providing their structureactivity relationships to aim for the generation of potent agents.


2007 ◽  
Vol 4 (3) ◽  
pp. 208-223 ◽  
Author(s):  
José A. Reyes ◽  
David Gilbert

Summary This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results.Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature.


2013 ◽  
Vol 765-767 ◽  
pp. 1622-1624
Author(s):  
Juan Juan Li ◽  
Yue Hui Chen

Proteins play biological function through the interactions in organisms. Proteins are major components of organisms, and they are of great significance. As an increasing number of high-throughput biological experiments are carried out, a large amount of biological data is produced. Bioinformatics is developed to study the relative data which turns out to be difficult to study using biological methods. The paper mainly studies how to apply the intelligent calculation methods to protein-protein interactions (PPIs) prediction. We proposed an approach, by combining auto covariance with artificial neural network classifier, to predict PPIs. Experiments show that our method performs better than related works with a 5% higher accuracy.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Peng Liu ◽  
Lei Yang ◽  
Daming Shi ◽  
Xianglong Tang

A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptivek-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology.


2017 ◽  
Vol 13 (12) ◽  
pp. 2592-2602
Author(s):  
Hafeez Ur Rehman ◽  
Inam Bari ◽  
Anwar Ali ◽  
Haroon Mahmood

Accurate elucidation of genome wide protein–protein interactions is crucial for understanding the regulatory processes of the cell.


2014 ◽  
Vol 07 (05) ◽  
pp. 1450053 ◽  
Author(s):  
Md. Sarwar Kamal ◽  
Mohammad Ibrahim Khan

Ongoing improvements in Computational Biology research have generated massive amounts of Protein–Protein Interactions (PPIs) dataset. In this regard, the availability of PPI data for several organisms provoke the discovery of computational methods for measurements, analysis, modeling, comparisons, clustering and alignments of biological data networks. Nevertheless, fixed network comparison is computationally stubborn and as a result several methods have been used instead. We illustrate a probabilistic approach among proteins nodes that are part of various networks by using Chapman–Kolmogorov (CK) formula. We have compared CK formula with semi-Markov random method, SMETANA. We significantly noticed that CK outperforms the SMETANA in all respects such as efficiency, speed, space and complexity. We have modified the SMETANA source codes available in MATLAB in the light of CK formula. Discriminant-Expectation Maximization (D-EM) accesses the parameters of a protein network datasets and determines a linear transformation to simplify the assumption of probabilistic format of data distributions and find good features dynamically. Our implementation finds that D-EM has a satisfactory performance in protein network alignment applications.


Sign in / Sign up

Export Citation Format

Share Document