Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome

AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.

Download Full-text

Multi-schema computational prediction of the comprehensive SARS-CoV-2 vs. human interactome

PeerJ ◽

10.7717/peerj.11117 ◽

2021 ◽

Vol 9 ◽

pp. e11117

Author(s):

Kevin Dick ◽

Anand Chopra ◽

Kyle K. Biggar ◽

James R. Green

Keyword(s):

Protein Interactions ◽

Scientific Community ◽

Predictive Performance ◽

Computational Prediction ◽

Viral Disease ◽

High Confidence ◽

Human Interactome ◽

Holistic Understanding ◽

Human Proteins ◽

Novel Coronavirus

Background Understanding the disease pathogenesis of the novel coronavirus, denoted SARS-CoV-2, is critical to the development of anti-SARS-CoV-2 therapeutics. The global propagation of the viral disease, denoted COVID-19 (“coronavirus disease 2019”), has unified the scientific community in searching for possible inhibitory small molecules or polypeptides. A holistic understanding of the SARS-CoV-2 vs. human inter-species interactome promises to identify putative protein-protein interactions (PPI) that may be considered targets for the development of inhibitory therapeutics. Methods We leverage two state-of-the-art, sequence-based PPI predictors (PIPE4 & SPRINT) capable of generating the comprehensive SARS-CoV-2 vs. human interactome, comprising approximately 285,000 pairwise predictions. Three prediction schemas (all, proximal, RP-PPI) are leveraged to obtain our highest-confidence subset of PPIs and human proteins predicted to interact with each of the 14 SARS-CoV-2 proteins considered in this study. Notably, the use of the Reciprocal Perspective (RP) framework demonstrates improved predictive performance in multiple cross-validation experiments. Results The all schema identified 279 high-confidence putative interactions involving 225 human proteins, the proximal schema identified 129 high-confidence putative interactions involving 126 human proteins, and the RP-PPI schema identified 539 high-confidence putative interactions involving 494 human proteins. The intersection of the three sets of predictions comprise the seven highest-confidence PPIs. Notably, the Spike-ACE2 interaction was the highest ranked for both the PIPE4 and SPRINT predictors with the all and proximal schemas, corroborating existing evidence for this PPI. Several other predicted PPIs are biologically relevant within the context of the original SARS-CoV virus. Furthermore, the PIPE-Sites algorithm was used to identify the putative subsequence that might mediate each interaction and thereby inform the design of inhibitory polypeptides intended to disrupt the corresponding host-pathogen interactions. Conclusion We publicly released the comprehensive sets of PPI predictions and their corresponding PIPE-Sites landscapes in the following DataVerse repository: https://www.doi.org/10.5683/SP2/JZ77XA. The information provided represents theoretical modeling only and caution should be exercised in its use. It is intended as a resource for the scientific community at large in furthering our understanding of SARS-CoV-2.

Download Full-text

A Multicenter Validation Study of the Deep Learning-based Early Warning Score for Predicting in-hospital Cardiac Arrest in Patients Admitted to General Wards

10.21203/rs.3.rs-61577/v1 ◽

2020 ◽

Author(s):

Yeon Joo Lee ◽

Kyung-Jae Cho ◽

Oyeon Kwon ◽

Hyunho Park ◽

Yeha Lee ◽

...

Keyword(s):

Cardiac Arrest ◽

Deep Learning ◽

Early Warning ◽

Characteristic Curve ◽

Predictive Performance ◽

Early Warning Score ◽

Patients At Risk ◽

Key Aspects ◽

General Wards ◽

Hospital Cardiac Arrest

Abstract Background: The recently developed deep learning (DL)-based early warning score (DEWS) has shown a potential in predicting deteriorating patients. We aimed to validate DEWS in multiple centers and compare the prediction, alarming and timeliness performance with those of the modified early warning score (MEWS) to identify patients at risk for in-hospital cardiac arrest (IHCA).Methods: This retrospective cohort study included adult patients admitted to the general wards of five hospitals during a 12-month period. We validated DEWS internally at two hospitals and externally at the other three hospitals. The occurrence of IHCA within 24 hours of vital sign observation was the outcome of interest. We used the area under the receiver operating characteristic curve (AUROC) as the main performance metric.Results: The study population consisted of 173,368 patients (224 IHCAs). The predictive performance of DEWS was superior to that of MEWS in both the internal (AUROC: 0.860 vs. 0.754, respectively) and external (AUROC: 0.905 vs. 0.785, respectively) validation cohorts. At the same specificity, DEWS had a higher sensitivity than MEWS, and at the same sensitivity, DEWS had a lower mean alarm count than MEWS, with nearly half of the alarm rate in MEWS. Additionally, DEWS was able to predict more IHCA patients in the 24 to 0.5 hours before the outcome.Conclusion: Our study showed that DEWS was superior to MEWS in the three key aspects (IHCA predictive, alarming, and timeliness performance). This study demonstrates the potential of DEWS as an effective, efficient screening tool in rapid response systems (RRSs) to identify high-risk patients.

Download Full-text

Prediction of Protein–Protein Interactions in Arabidopsis, Maize, and Rice by Combining Deep Neural Network With Discrete Hilbert Transform

Frontiers in Genetics ◽

10.3389/fgene.2021.745228 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Chang-Qing Yu ◽

Zhong-Hao Ren ◽

...

Keyword(s):

Hilbert Transform ◽

Protein Interactions ◽

Characteristic Curve ◽

Predictive Ability ◽

Predictive Performance ◽

Plant Protein ◽

Technical Equipment ◽

Protein Protein Interactions ◽

Discrete Hilbert Transform ◽

Feature Descriptors

Protein–protein interactions (PPIs) in plants play an essential role in the regulation of biological processes. However, traditional experimental methods are expensive, time-consuming, and need sophisticated technical equipment. These drawbacks motivated the development of novel computational approaches to predict PPIs in plants. In this article, a new deep learning framework, which combined the discrete Hilbert transform (DHT) with deep neural networks (DNN), was presented to predict PPIs in plants. To be more specific, plant protein sequences were first transformed as a position-specific scoring matrix (PSSM). Then, DHT was employed to capture features from the PSSM. To improve the prediction accuracy, we used the singular value decomposition algorithm to decrease noise and reduce the dimensions of the feature descriptors. Finally, these feature vectors were fed into DNN for training and predicting. When performing our method on three plant PPI datasets Arabidopsis thaliana, maize, and rice, we achieved good predictive performance with average area under receiver operating characteristic curve values of 0.8369, 0.9466, and 0.9440, respectively. To fully verify the predictive ability of our method, we compared it with different feature descriptors and machine learning classifiers. Moreover, to further demonstrate the generality of our approach, we also test it on the yeast and human PPI dataset. Experimental results anticipated that our method is an efficient and promising computational model for predicting potential plant–protein interacted pairs.

Download Full-text

Multiscale Virtual Screening Optimization for Shotgun Drug Repurposing Using the CANDO Platform

Molecules ◽

10.3390/molecules26092581 ◽

2021 ◽

Vol 26 (9) ◽

pp. 2581

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Predictive Performance ◽

Clinical Indication ◽

Docking Method

Drug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multi-disease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines for the large-scale modeling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is compared to all other signatures that are subsequently sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions used to create the drug-proteome signatures may be determined by any screening or docking method, but the primary approach used thus far has been BANDOCK, our in-house bioanalytical or similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and chem-informatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the two docking-based pipelines from which it was synthesized, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking-based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking-based signature generation methods can capture unique and useful signals for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Recent Advances on the Semi-Supervised Learning for Long Non-Coding RNA-Protein Interactions Prediction: A Review

Protein and Peptide Letters ◽

10.2174/0929866526666191025104043 ◽

2020 ◽

Vol 27 (5) ◽

pp. 385-391

Author(s):

Lin Zhong ◽

Zhong Ming ◽

Guobo Xie ◽

Chunlong Fan ◽

Xue Piao

Keyword(s):

Supervised Learning ◽

Protein Interactions ◽

Computational Models ◽

Prediction Models ◽

Chromatin Modification ◽

Computational Prediction ◽

Human Diseases ◽

Future Research ◽

Non Coding Rna ◽

Long Non Coding Rna

: In recent years, more and more evidence indicates that long non-coding RNA (lncRNA) plays a significant role in the development of complex biological processes, especially in RNA progressing, chromatin modification, and cell differentiation, as well as many other processes. Surprisingly, lncRNA has an inseparable relationship with human diseases such as cancer. Therefore, only by knowing more about the function of lncRNA can we better solve the problems of human diseases. However, lncRNAs need to bind to proteins to perform their biomedical functions. So we can reveal the lncRNA function by studying the relationship between lncRNA and protein. But due to the limitations of traditional experiments, researchers often use computational prediction models to predict lncRNA protein interactions. In this review, we summarize several computational models of the lncRNA protein interactions prediction base on semi-supervised learning during the past two years, and introduce their advantages and shortcomings briefly. Finally, the future research directions of lncRNA protein interaction prediction are pointed out.

Download Full-text

Machine Learning-Based Scoring Functions. Development and Applications with SAnDReS.

Current Medicinal Chemistry ◽

10.2174/0929867327666200515101820 ◽

2020 ◽

Vol 27 ◽

Author(s):

Gabriela Bitencourt-Ferreira ◽

Camila Rizzotto ◽

Walter Filgueira de Azevedo Junior

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Drug Targets ◽

Computational Models ◽

Factor Xa ◽

Coagulation Factor ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Molegro Virtual Docker

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.

Download Full-text

Dimensionality Reduction Techniques in the Computational Prediction of Protein-Protein Interactions: Classical versus Sophisticated New Techniques

10.21770/0907-3004.001 ◽

2016 ◽

Vol 1 (1) ◽

pp. 01-27

Author(s):

Konstantinos A. Theofilatos

Keyword(s):

Dimensionality Reduction ◽

Protein Interactions ◽

Computational Prediction ◽

Protein Protein Interactions ◽

New Techniques ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

De novo design of a reversible phosphorylation-dependent switch for membrane targeting

Nature Communications ◽

10.1038/s41467-021-21622-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Leon Harrington ◽

Jordan M. Fletcher ◽

Tamara Heermann ◽

Derek N. Woolfson ◽

Petra Schwille

Keyword(s):

Protein Interactions ◽

Lipid Membrane ◽

De Novo ◽

Protein Localization ◽

Protein Structures ◽

Spatiotemporal Pattern ◽

Membrane Targeting ◽

Protein Protein Interactions ◽

Reversible Phosphorylation ◽

Potential Applications

AbstractModules that switch protein-protein interactions on and off are essential to develop synthetic biology; for example, to construct orthogonal signaling pathways, to control artificial protein structures dynamically, and for protein localization in cells or protocells. In nature, the E. coli MinCDE system couples nucleotide-dependent switching of MinD dimerization to membrane targeting to trigger spatiotemporal pattern formation. Here we present a de novo peptide-based molecular switch that toggles reversibly between monomer and dimer in response to phosphorylation and dephosphorylation. In combination with other modules, we construct fusion proteins that couple switching to lipid-membrane targeting by: (i) tethering a ‘cargo’ molecule reversibly to a permanent membrane ‘anchor’; and (ii) creating a ‘membrane-avidity switch’ that mimics the MinD system but operates by reversible phosphorylation. These minimal, de novo molecular switches have potential applications for introducing dynamic processes into designed and engineered proteins to augment functions in living cells and add functionality to protocells.

Download Full-text

Cross‐Linking/Mass Spectrometry for Studying Protein Structures and Protein–Protein Interactions: Where Are We Now and Where Should We Go from Here?

Angewandte Chemie International Edition ◽

10.1002/anie.201709559 ◽

2018 ◽

Vol 57 (22) ◽

pp. 6390-6396 ◽

Cited By ~ 82

Author(s):

Andrea Sinz

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Protein Structures ◽

Cross Linking ◽

Protein Protein Interactions

Download Full-text