scholarly journals Protein Domain-Based Prediction of Compound–Target Interactions and Experimental Validation on LIM Kinases

2021 ◽  
Author(s):  
Tunca Doğan ◽  
Ece Akhan Güzelcan ◽  
Marcus Baumann ◽  
Altay Koyas ◽  
Heval Atas ◽  
...  

Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins’ structure/function, and bias in system training datasets. Here, we propose a new computational method “DRUIDom” to predict bio-interactions between drug candidate compounds and target proteins by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying physical or functional interactions. As such, other proteins containing the mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including the ones mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting compound–target pairs (~2.9M data points), and used as training data for calculating parameters of compound–domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound–protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound–domain relationships. The datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.

2021 ◽  
Vol 17 (11) ◽  
pp. e1009171
Author(s):  
Tunca Doğan ◽  
Ece Akhan Güzelcan ◽  
Marcus Baumann ◽  
Altay Koyas ◽  
Heval Atas ◽  
...  

Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins’ structure/function, and bias in system training datasets. Here, we propose a new method “DRUIDom” (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound–target pairs (~2.9M data points), and used as training data for calculating parameters of compound–domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound–protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound–domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.


Author(s):  
SUSHMA S MURTHY ◽  
BALA NARSAIAH T

Objective: The objective of the study was to understand biomolecular interactions of Bromelain and its networking with p53 and β-catenin by a computational method of analysis in Hepatocellular carcinoma (HCC) condition. Methodology: The protein interaction partners for p53 and β-catenin involved in the progression of HCC were collected from National Center for Biotechnology Information. We collected data points and standardized the data points for our data analysis from the public database. We used Cytoscape 3.8.2 version plug-in for constructing a Protein-Protein interaction network. We constructed a pathway network using Biorender.com. Results: The protein interactions concerning p53 and β-catenin are identified and a network is constructed. A total of 18 and 34 nodes were identified which are involved in down-regulation and up-regulation of β-catenin and a total of 30 and 27 nodes for homosapiens are identified which are involved in the downregulation and upregulation of the p53 gene. We identified different pathways which trigger and impact the p53 and Wnt/β- catenin signaling pathways as potential target sites for Bromelain to arrest the progression of cancer Conclusion: In conclusion, our in silico studies anti-cancer activity of Bromelain in HCC relating its effect on apoptosis, cell differentiation, mesenchymal transition, p53 signaling, and Wnt/β-catenin signaling pathways.


2020 ◽  
Author(s):  
Xinyu Bai ◽  
Yuxin Yin

Abstract Predicting drug-protein interactions (DPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of DPI matrixes, resulting in poor generalization performance. Hence, unlike typical DPI prediction models which focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM_AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the pharmacological space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA to external datasets again demonstrated enhanced generalization ability of EPA that could gracefully handle previously unseen kinases or inhibitors. Further analysis showed promising potential when EPA was directly applied to virtual screening and off-target prediction, exhibiting the practicality of the EPA model in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xin He ◽  
Linai Kuang ◽  
Zhiping Chen ◽  
Yihong Tan ◽  
Lei Wang

In recent years, due to low accuracy and high costs of traditional biological experiments, more and more computational models have been proposed successively to infer potential essential proteins. In this paper, a novel prediction method called KFPM is proposed, in which, a novel protein-domain heterogeneous network is established first by combining known protein-protein interactions with known associations between proteins and domains. Next, based on key topological characteristics extracted from the newly constructed protein-domain network and functional characteristics extracted from multiple biological information of proteins, a new computational method is designed to effectively integrate multiple biological features to infer potential essential proteins based on an improved PageRank algorithm. Finally, in order to evaluate the performance of KFPM, we compared it with 13 state-of-the-art prediction methods, experimental results show that, among the top 1, 5, and 10% of candidate proteins predicted by KFPM, the prediction accuracy can achieve 96.08, 83.14, and 70.59%, respectively, which significantly outperform all these 13 competitive methods. It means that KFPM may be a meaningful tool for prediction of potential essential proteins in the future.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Xinyu Bai ◽  
Yuxin Yin

AbstractPredicting compound–protein interactions (CPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of CPI matrixes, resulting in poor generalization performance. Hence, unlike typical CPI prediction models focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM-AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing the adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the data space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA for external datasets enhances its generalization ability, making it possible to gracefully handle previously unseen kinases and inhibitors. EPA showed promising potential when directly applied to virtual screening and off-target prediction, exhibiting its practicality in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.


2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


2019 ◽  
Author(s):  
Liwei Cao ◽  
Danilo Russo ◽  
Vassilios S. Vassiliadis ◽  
Alexei Lapkin

<p>A mixed-integer nonlinear programming (MINLP) formulation for symbolic regression was proposed to identify physical models from noisy experimental data. The formulation was tested using numerical models and was found to be more efficient than the previous literature example with respect to the number of predictor variables and training data points. The globally optimal search was extended to identify physical models and to cope with noise in the experimental data predictor variable. The methodology was coupled with the collection of experimental data in an automated fashion, and was proven to be successful in identifying the correct physical models describing the relationship between the shear stress and shear rate for both Newtonian and non-Newtonian fluids, and simple kinetic laws of reactions. Future work will focus on addressing the limitations of the formulation presented in this work, by extending it to be able to address larger complex physical models.</p><p><br></p>


2019 ◽  
Vol 19 (4) ◽  
pp. 232-241 ◽  
Author(s):  
Xuegong Chen ◽  
Wanwan Shi ◽  
Lei Deng

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.


Author(s):  
STEFANO MERLER ◽  
BRUNO CAPRILE ◽  
CESARE FURLANELLO

In this paper, we propose a regularization technique for AdaBoost. The method implements a bias-variance control strategy in order to avoid overfitting in classification tasks on noisy data. The method is based on a notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. The procedure consists in sorting the training data points by a hardness measure, and in progressively eliminating the hardest, stopping at an automatically selected threshold. Effectiveness of the method is tested and discussed on synthetic as well as real data.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Bingyin Hu ◽  
Anqi Lin ◽  
L. Catherine Brinson

AbstractThe inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names is a major challenge for widespread use of polymer related data resources and limits broad application of materials informatics for innovation in broad classes of polymer science and polymeric based materials. The current solution of using a variety of different chemical identifiers has proven insufficient to address the challenge and is not intuitive for researchers. This work proposes a multi-algorithm-based mapping methodology entitled ChemProps that is optimized to solve the polymer indexing issue with easy-to-update design both in depth and in width. RESTful API is enabled for lightweight data exchange and easy integration across data systems. A weight factor is assigned to each algorithm to generate scores for candidate chemical names and optimized to maximize the minimum value of the score difference between the ground truth chemical name and the other candidate chemical names. Ten-fold validation is utilized on the 160 training data points to prevent overfitting issues. The obtained set of weight factors achieves a 100% test accuracy on the 54 test data points. The weight factors will evolve as ChemProps grows. With ChemProps, other polymer databases can remove duplicate entries and enable a more accurate “search by SMILES” function by using ChemProps as a common name-to-SMILES translator through API calls. ChemProps is also an excellent tool for auto-populating polymer properties thanks to its easy-to-update design.


Sign in / Sign up

Export Citation Format

Share Document