Multiscale Virtual Screening Optimization for Shotgun Drug Repurposing Using the CANDO Platform

Drug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multi-disease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines for the large-scale modeling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is compared to all other signatures that are subsequently sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions used to create the drug-proteome signatures may be determined by any screening or docking method, but the primary approach used thus far has been BANDOCK, our in-house bioanalytical or similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and chem-informatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the two docking-based pipelines from which it was synthesized, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking-based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking-based signature generation methods can capture unique and useful signals for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Multiscale virtual screening optimization for shotgun drug repurposing using the CANDO platform

10.1101/2020.08.24.265488 ◽

2020 ◽

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Clinical Indication ◽

Docking Method ◽

Useful Signal

AbstractDrug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multidisease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines via large scale modelling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is then compared to all other signatures that are then sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions in the platform used to create the drug-proteome signatures may be determined by any screening or docking method but the primary approach used thus far has been an in house similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and cheminformatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the corresponding two docking-based pipelines it was synthesized from, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking based signature generation methods can capture unique and useful signal for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Multiscale Virtual Screening Optimization for Shotgun Drug Repurposing Using the CANDO Platform

10.20944/preprints202104.0475.v1 ◽

2021 ◽

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Clinical Indication ◽

Docking Method ◽

Useful Signal

Drug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multidisease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines via large scale modelling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is then compared to all other signatures that are then sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions in the platform used to create the drug-proteome signatures may be determined by any screening or docking method but the primary approach used thus far has been an in house similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and cheminformatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the corresponding two docking-based pipelines it was synthesized from, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking based signature generation methods can capture unique and useful signal for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome

Briefings in Bioinformatics ◽

10.1093/bib/bby069 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2066-2087 ◽

Cited By ~ 8

Author(s):

Chen Wang ◽

Lukasz Kurgan

Keyword(s):

Protein Interactions ◽

Drug Targets ◽

Characteristic Curve ◽

Protein Structures ◽

Predictive Performance ◽

Computational Prediction ◽

Comprehensive Analysis ◽

Model Combining ◽

Benchmark Database ◽

Key Aspects

AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.

Download Full-text

FrustratometeR: an R-package to compute Local frustration in protein structures, point mutants and MD simulations

10.1101/2020.11.26.400432 ◽

2020 ◽

Author(s):

Atilio O. Rausch ◽

Maria I. Freiberger ◽

Cesar O. Leonetti ◽

Diego M. Luna ◽

Leandro G. Radusky ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Protein Structures ◽

Md Simulations ◽

R Package ◽

Protein Protein Interactions ◽

Large Scale Analysis ◽

Functional Aspects ◽

Catalytic Sites ◽

Polypeptide Chains

Once folded natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they have highly frustrated regions. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein-protein interactions, small ligand recognition, catalytic sites and allostery. Here we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and MD trajectories, allowing straightforward integration of local frustration analysis in to pipelines for protein structural analysis.Availability and implementation: https://github.com/proteinphysiologylab/frustratometeR

Download Full-text

Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling

10.1101/2021.09.14.460228 ◽

2021 ◽

Author(s):

Jimin Pei ◽

Jing Zhang ◽

Qian Cong

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Mitochondrial Protein ◽

Protein Structures ◽

Complex Structures ◽

Protein Protein Interactions ◽

Learning Methods ◽

Contact Probability

AbstractRecent development of deep-learning methods has led to a breakthrough in the prediction accuracy of 3-dimensional protein structures. Extending these methods to protein pairs is expected to allow large-scale detection of protein-protein interactions and modeling protein complexes at the proteome level. We applied RoseTTAFold and AlphaFold2, two of the latest deep-learning methods for structure predictions, to analyze coevolution of human proteins residing in mitochondria, an organelle of vital importance in many cellular processes including energy production, metabolism, cell death, and antiviral response. Variations in mitochondrial proteins have been linked to a plethora of human diseases and genetic conditions. RoseTTAFold, with high computational speed, was used to predict the coevolution of about 95% of mitochondrial protein pairs. Top-ranked pairs were further subject to the modeling of the complex structures by AlphaFold2, which also produced contact probability with high precision and in many cases consistent with RoseTTAFold. Most of the top ranked pairs with high contact probability were supported by known protein-protein interactions and/or similarities to experimental structural complexes. For high-scoring pairs without experimental complex structures, our coevolution analyses and structural models shed light on the details of their interfaces, including CHCHD4-AIFM1, MTERF3-TRUB2, FMC1-ATPAF2, ECSIT-NDUFAF1 and COQ7-COQ9, among others. We also identified novel PPIs (PYURF-NDUFAF5, LYRM1-MTRF1L and COA8-COX10) for several proteins without experimentally characterized interaction partners, leading to predictions of their molecular functions and the biological processes they are involved in.

Download Full-text

HYBRID DECISION TREE ARCHITECTURE UTILIZING LOCAL SVMs FOR EFFICIENT MULTI-LABEL LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141351004x ◽

2013 ◽

Vol 27 (07) ◽

pp. 1351004 ◽

Cited By ~ 3

Author(s):

DEJAN GJORGJEVIKJ ◽

GJORGJI MADJAROV ◽

SAŠO DŽEROSKI

Keyword(s):

Decision Tree ◽

Text Categorization ◽

Large Scale ◽

Semantic Annotation ◽

Predictive Performance ◽

Tree Architecture ◽

Support Vector ◽

Svm Classifier ◽

Strong Impact ◽

Classification Problems

Multi-label learning (MLL) problems abound in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. Issues that severely limit the applicability of many current machine learning approaches to MLL are the large-scale problem, which have a strong impact on the computational complexity of learning. These problems are especially pronounced for approaches that transform MLL problems into a set of binary classification problems for which Support Vector Machines (SVMs) are used. On the other hand, the most efficient approaches to MLL, based on decision trees, have clearly lower predictive performance. We propose a hybrid decision tree architecture, where the leaves do not give multi-label predictions directly, but rather utilize local SVM-based classifiers giving multi-label predictions. A binary relevance architecture is employed in the leaves, where a binary SVM classifier is built for each of the labels relevant to that particular leaf. We use a broad range of multi-label datasets with a variety of evaluation measures to evaluate the proposed method against related and state-of-the-art methods, both in terms of predictive performance and time complexity. Our hybrid architecture on almost every large classification problem outperforms the competing approaches in terms of the predictive performance, while its computational efficiency is significantly improved as a result of the integrated decision tree.

Download Full-text

Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences

Nature Communications ◽

10.1038/s41467-021-21636-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Anna G. Green ◽

Hadeer Elhabashy ◽

Kelly P. Brock ◽

Rohan Maddamsetti ◽

Oliver Kohlbacher ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Membrane Proteome ◽

E Coli ◽

Interaction Prediction ◽

Pairwise Interactions ◽

High Throughput Experiments

AbstractIncreasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in the E. coli membrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex.

Download Full-text

Fingerprinting CANDO: Increased Accuracy with Structure and Ligand Based Shotgun Drug Repurposing

10.1101/591123 ◽

2019 ◽

Cited By ~ 2

Author(s):

James Schuler ◽

Ram Samudrala

Keyword(s):

Drug Discovery ◽

Decision Tree ◽

Data Fusion ◽

Large Scale ◽

Computational Analysis ◽

Drug Repurposing ◽

Molecular Fingerprints ◽

Novel Drug ◽

Benchmarking Performance ◽

Random Control

We have upgraded our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun drug repurposing to include ligand-based, data fusion, and decision tree pipelines. The first version of CANDO implemented a structure-based pipeline that modeled interactions between compounds and proteins on a large scale, generating compoundproteome interaction signatures used to infer similarity of drug behavior; the new pipelines accomplish this by incorporating molecular fingerprints and the Tanimoto coefficient. We obtain improved benchmarking performance with the new pipelines across all three evaluation metrics used: average indication accuracy, pairwise accuracy, and coverage. The best performing pipeline achieves an average indication accuracy of 19.0% at the top10 cutoff, compared to 11.7% for v1, and 2.2% for a random control. Our results demonstrate that the CANDO drug recovery accuracy is substantially improved by integrating multiple pipelines, thereby enhancing our ability to generate putative therapeutic repurposing candidates, and increasing drug discovery efficiency.

Download Full-text

LARGE-SCALE STRUCTURAL MODELING OF PROTEIN COMPLEXES AT LOW RESOLUTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720008003679 ◽

2008 ◽

Vol 06 (04) ◽

pp. 789-810 ◽

Cited By ~ 5

Author(s):

ZHENGWEI ZHU ◽

ANDREY TOVCHIGRECHKO ◽

TATIANA BARONOVA ◽

YING GAO ◽

DOMINIQUE DOUGUET ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Protein Structures ◽

Protein Docking ◽

Protein Interaction Data ◽

Essential Information ◽

Genome Wide ◽

Docking Program ◽

Docking Approach

Structural aspects of protein–protein interactions provided by large-scale, genome-wide studies are essential for the description of life processes at the molecular level. A methodology is developed that applies the protein docking approach (GRAMM), based on the knowledge of experimentally determined protein–protein structures (DOCKGROUND resource) and properties of intermolecular energy landscapes, to genome-wide systems of protein interactions. The full sequence-to-structure-of-complex modeling pipeline is implemented in the Genome Wide Docking Database (GWIDD) resource. Protein interaction data are imported to GWIDD from external datasets of experimentally determined interaction networks. Essential information is extracted and unified to form the GWIDD database. Structures of individual interacting proteins in the database are retrieved (if available) or modeled, and protein complex structures are predicted by the docking program. All protein sequence, structure, and docking information is conveniently accessible through a Web interface.

Download Full-text

Development of a compact alkynyl-enrichable crosslinker for in-depth in-vivo crosslinking analysis

10.1101/2021.07.30.454285 ◽

2021 ◽

Author(s):

Hang Gao ◽

Li Li Zhao ◽

Qun Zhao ◽

Hua Li Zhang ◽

Feng Bao Zhao ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

High Efficiency ◽

Protein Complexes ◽

Protein Structures ◽

High Sensitivity ◽

Protein Protein Interactions ◽

Cross Links ◽

First Time

Chemical crosslinking coupled with mass spectrometry (CXMS) has emerged as a powerful technique to capture the dynamic information of protein complexes with high sensitivity, throughput and sample universality. To advance the study of in-vivo protein structures and protein-protein interactions on the large scale, a new alkynyl-enrichable crosslinker was developed with high efficiency of membrane penetration, reactivity and enrichment. The crosslinker was successfully used for in-vivo crosslinking of intact human cells, resulting in 6820 non-redundant crosslinks identified at a false discovery rate (FDR) of 1% using pLink 2.0, which 4898 (71.8%) of the cross-links were assigned as intraprotein and 1922 (28.2%) were interprotein links. To our knowledge, this is also the first time to realize the in-vivo crosslinking with a non-cleavable cross-linker for homo species cells.

Download Full-text