Multiscale virtual screening optimization for shotgun drug repurposing using the CANDO platform

Mapping Intimacies ◽

10.1101/2020.08.24.265488 ◽

2020 ◽

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Clinical Indication ◽

Docking Method ◽

Useful Signal

AbstractDrug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multidisease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines via large scale modelling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is then compared to all other signatures that are then sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions in the platform used to create the drug-proteome signatures may be determined by any screening or docking method but the primary approach used thus far has been an in house similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and cheminformatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the corresponding two docking-based pipelines it was synthesized from, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking based signature generation methods can capture unique and useful signal for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Multiscale Virtual Screening Optimization for Shotgun Drug Repurposing Using the CANDO Platform

10.20944/preprints202104.0475.v1 ◽

2021 ◽

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Clinical Indication ◽

Docking Method ◽

Useful Signal

Drug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multidisease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines via large scale modelling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is then compared to all other signatures that are then sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions in the platform used to create the drug-proteome signatures may be determined by any screening or docking method but the primary approach used thus far has been an in house similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and cheminformatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the corresponding two docking-based pipelines it was synthesized from, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking based signature generation methods can capture unique and useful signal for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

Multiscale Virtual Screening Optimization for Shotgun Drug Repurposing Using the CANDO Platform

Molecules ◽

10.3390/molecules26092581 ◽

2021 ◽

Vol 26 (9) ◽

pp. 2581

Author(s):

Matthew L. Hudson ◽

Ram Samudrala

Keyword(s):

Virtual Screening ◽

Decision Tree ◽

Protein Interactions ◽

Large Scale ◽

Therapeutic Potential ◽

Protein Structures ◽

Drug Repurposing ◽

Predictive Performance ◽

Clinical Indication ◽

Docking Method

Drug repurposing, the practice of utilizing existing drugs for novel clinical indications, has tremendous potential for improving human health outcomes and increasing therapeutic development efficiency. The goal of multi-disease multitarget drug repurposing, also known as shotgun drug repurposing, is to develop platforms that assess the therapeutic potential of each existing drug for every clinical indication. Our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget repurposing implements several pipelines for the large-scale modeling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. In these pipelines, each drug is described by an interaction signature that is compared to all other signatures that are subsequently sorted and ranked based on similarity. Pipelines within the platform are benchmarked based on their ability to recover known drugs for all indications in our library, and predictions are generated based on the hypothesis that (novel) drugs with similar signatures may be repurposed for the same indication(s). The drug-protein interactions used to create the drug-proteome signatures may be determined by any screening or docking method, but the primary approach used thus far has been BANDOCK, our in-house bioanalytical or similarity docking protocol. In this study, we calculated drug-proteome interaction signatures using the publicly available molecular docking method Autodock Vina and created hybrid decision tree pipelines that combined our original bio- and chem-informatic approach with the goal of assessing and benchmarking their drug repurposing capabilities and performance. The hybrid decision tree pipeline outperformed the two docking-based pipelines from which it was synthesized, yielding an average indication accuracy of 13.3% at the top10 cutoff (the most stringent), relative to 10.9% and 7.1% for its constituent pipelines, and a random control accuracy of 2.2%. We demonstrate that docking-based virtual screening pipelines have unique performance characteristics and that the CANDO shotgun repurposing paradigm is not dependent on a specific docking method. Our results also provide further evidence that multiple CANDO pipelines can be synthesized to enhance drug repurposing predictive capability relative to their constituent pipelines. Overall, this study indicates that pipelines consisting of varied docking-based signature generation methods can capture unique and useful signals for accurate comparison of drug-proteome interaction signatures, leading to improvements in the benchmarking and predictive performance of the CANDO shotgun drug repurposing platform.

Download Full-text

FrustratometeR: an R-package to compute Local frustration in protein structures, point mutants and MD simulations

10.1101/2020.11.26.400432 ◽

2020 ◽

Author(s):

Atilio O. Rausch ◽

Maria I. Freiberger ◽

Cesar O. Leonetti ◽

Diego M. Luna ◽

Leandro G. Radusky ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Protein Structures ◽

Md Simulations ◽

R Package ◽

Protein Protein Interactions ◽

Large Scale Analysis ◽

Functional Aspects ◽

Catalytic Sites ◽

Polypeptide Chains

Once folded natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they have highly frustrated regions. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein-protein interactions, small ligand recognition, catalytic sites and allostery. Here we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and MD trajectories, allowing straightforward integration of local frustration analysis in to pipelines for protein structural analysis.Availability and implementation: https://github.com/proteinphysiologylab/frustratometeR

Download Full-text

Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling

10.1101/2021.09.14.460228 ◽

2021 ◽

Author(s):

Jimin Pei ◽

Jing Zhang ◽

Qian Cong

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Mitochondrial Protein ◽

Protein Structures ◽

Complex Structures ◽

Protein Protein Interactions ◽

Learning Methods ◽

Contact Probability

AbstractRecent development of deep-learning methods has led to a breakthrough in the prediction accuracy of 3-dimensional protein structures. Extending these methods to protein pairs is expected to allow large-scale detection of protein-protein interactions and modeling protein complexes at the proteome level. We applied RoseTTAFold and AlphaFold2, two of the latest deep-learning methods for structure predictions, to analyze coevolution of human proteins residing in mitochondria, an organelle of vital importance in many cellular processes including energy production, metabolism, cell death, and antiviral response. Variations in mitochondrial proteins have been linked to a plethora of human diseases and genetic conditions. RoseTTAFold, with high computational speed, was used to predict the coevolution of about 95% of mitochondrial protein pairs. Top-ranked pairs were further subject to the modeling of the complex structures by AlphaFold2, which also produced contact probability with high precision and in many cases consistent with RoseTTAFold. Most of the top ranked pairs with high contact probability were supported by known protein-protein interactions and/or similarities to experimental structural complexes. For high-scoring pairs without experimental complex structures, our coevolution analyses and structural models shed light on the details of their interfaces, including CHCHD4-AIFM1, MTERF3-TRUB2, FMC1-ATPAF2, ECSIT-NDUFAF1 and COQ7-COQ9, among others. We also identified novel PPIs (PYURF-NDUFAF5, LYRM1-MTRF1L and COA8-COX10) for several proteins without experimentally characterized interaction partners, leading to predictions of their molecular functions and the biological processes they are involved in.

Download Full-text

Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences

Nature Communications ◽

10.1038/s41467-021-21636-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Anna G. Green ◽

Hadeer Elhabashy ◽

Kelly P. Brock ◽

Rohan Maddamsetti ◽

Oliver Kohlbacher ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Membrane Proteome ◽

E Coli ◽

Interaction Prediction ◽

Pairwise Interactions ◽

High Throughput Experiments

AbstractIncreasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in the E. coli membrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex.

Download Full-text

Fingerprinting CANDO: Increased Accuracy with Structure and Ligand Based Shotgun Drug Repurposing

10.1101/591123 ◽

2019 ◽

Cited By ~ 2

Author(s):

James Schuler ◽

Ram Samudrala

Keyword(s):

Drug Discovery ◽

Decision Tree ◽

Data Fusion ◽

Large Scale ◽

Computational Analysis ◽

Drug Repurposing ◽

Molecular Fingerprints ◽

Novel Drug ◽

Benchmarking Performance ◽

Random Control

We have upgraded our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun drug repurposing to include ligand-based, data fusion, and decision tree pipelines. The first version of CANDO implemented a structure-based pipeline that modeled interactions between compounds and proteins on a large scale, generating compoundproteome interaction signatures used to infer similarity of drug behavior; the new pipelines accomplish this by incorporating molecular fingerprints and the Tanimoto coefficient. We obtain improved benchmarking performance with the new pipelines across all three evaluation metrics used: average indication accuracy, pairwise accuracy, and coverage. The best performing pipeline achieves an average indication accuracy of 19.0% at the top10 cutoff, compared to 11.7% for v1, and 2.2% for a random control. Our results demonstrate that the CANDO drug recovery accuracy is substantially improved by integrating multiple pipelines, thereby enhancing our ability to generate putative therapeutic repurposing candidates, and increasing drug discovery efficiency.

Download Full-text

LARGE-SCALE STRUCTURAL MODELING OF PROTEIN COMPLEXES AT LOW RESOLUTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720008003679 ◽

2008 ◽

Vol 06 (04) ◽

pp. 789-810 ◽

Cited By ~ 5

Author(s):

ZHENGWEI ZHU ◽

ANDREY TOVCHIGRECHKO ◽

TATIANA BARONOVA ◽

YING GAO ◽

DOMINIQUE DOUGUET ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Protein Structures ◽

Protein Docking ◽

Protein Interaction Data ◽

Essential Information ◽

Genome Wide ◽

Docking Program ◽

Docking Approach

Structural aspects of protein–protein interactions provided by large-scale, genome-wide studies are essential for the description of life processes at the molecular level. A methodology is developed that applies the protein docking approach (GRAMM), based on the knowledge of experimentally determined protein–protein structures (DOCKGROUND resource) and properties of intermolecular energy landscapes, to genome-wide systems of protein interactions. The full sequence-to-structure-of-complex modeling pipeline is implemented in the Genome Wide Docking Database (GWIDD) resource. Protein interaction data are imported to GWIDD from external datasets of experimentally determined interaction networks. Essential information is extracted and unified to form the GWIDD database. Structures of individual interacting proteins in the database are retrieved (if available) or modeled, and protein complex structures are predicted by the docking program. All protein sequence, structure, and docking information is conveniently accessible through a Web interface.

Download Full-text

Development of a compact alkynyl-enrichable crosslinker for in-depth in-vivo crosslinking analysis

10.1101/2021.07.30.454285 ◽

2021 ◽

Author(s):

Hang Gao ◽

Li Li Zhao ◽

Qun Zhao ◽

Hua Li Zhang ◽

Feng Bao Zhao ◽

...

Keyword(s):

Protein Interactions ◽

Large Scale ◽

High Efficiency ◽

Protein Complexes ◽

Protein Structures ◽

High Sensitivity ◽

Protein Protein Interactions ◽

Cross Links ◽

First Time

Chemical crosslinking coupled with mass spectrometry (CXMS) has emerged as a powerful technique to capture the dynamic information of protein complexes with high sensitivity, throughput and sample universality. To advance the study of in-vivo protein structures and protein-protein interactions on the large scale, a new alkynyl-enrichable crosslinker was developed with high efficiency of membrane penetration, reactivity and enrichment. The crosslinker was successfully used for in-vivo crosslinking of intact human cells, resulting in 6820 non-redundant crosslinks identified at a false discovery rate (FDR) of 1% using pLink 2.0, which 4898 (71.8%) of the cross-links were assigned as intraprotein and 1922 (28.2%) were interprotein links. To our knowledge, this is also the first time to realize the in-vivo crosslinking with a non-cleavable cross-linker for homo species cells.

Download Full-text

The complexity of protein interactions unravelled from structural disorder

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008546 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008546

Author(s):

Beatriz Seoane ◽

Alessandra Carbone

Keyword(s):

Complex Formation ◽

Protein Interactions ◽

Large Scale ◽

Protein Structures ◽

Data Bank ◽

Structural Disorder ◽

Dna Binding Domains ◽

Alternative Structures ◽

Binding Domains ◽

Disordered Regions

The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved protein structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. At variance with previous works, our conclusions rely exclusively on a large-scale analysis of all the 134337 X-ray crystallographic structures of the Protein Data Bank averaged over clusters of almost identical protein sequences. In this work, we explore the complexity of the organisation of all the interaction interfaces observed when a protein lies in alternative complexes, showing that interfaces progressively add up in a hierarchical way, which is reflected in a logarithmic law for the size of the union of the interface regions on the number of distinct interfaces. We further investigate the connection of this complexity with different measures of structural disorder: the standard missing residues and a new definition, called “soft disorder”, that covers all the flexible and structurally amorphous residues of a protein. We show evidences that both the interaction interfaces and the soft disordered regions tend to involve roughly the same amino-acids of the protein, and preliminary results suggesting that soft disorder spots those surface regions where new interfaces are progressively accommodated by complex formation. In fact, our results suggest that structurally disordered regions not only carry crucial information about the location of alternative interfaces within complexes, but also about the order of the assembly. We verify these hypotheses in several examples, such as the DNA binding domains of P53 and P73, the C3 exoenzyme, and two known biological orders of assembly. We finally compare our measures of structural disorder with several disorder bioinformatics predictors, showing that these latter are optimised to predict the residues that are missing in all the alternative structures of a protein and they are not able to catch the progressive evolution of the disordered regions upon complex formation. Yet, the predicted residues, when not missing, tend to be characterised as soft disordered regions.

Download Full-text

Target-Templated de novo Design of Macrocyclic D-/L-Peptides: Inhibitors of the PD-1/PD-L1 Interaction

10.26434/chemrxiv.11663337.v3 ◽

2020 ◽

Author(s):

Salvador Guardiola ◽

Monica Varese ◽

Xavier Roig ◽

Jesús Garcia ◽

Ernest Giralt

Keyword(s):

Protein Interactions ◽

Cyclic Peptides ◽

General Framework ◽

Large Scale ◽

De Novo ◽

Inhibitory Effect ◽

Original Text ◽

Protein Protein Interactions ◽

Retraction Notice ◽

Pharmaceutical Properties

NOTE: This preprint has been retracted by consensus from all authors. See the retraction notice in place above; the original text can be found under "Version 1", accessible from the version selector above. ------------------------------------------------------------------------ Peptides, together with antibodies, are among the most potent biochemical tools to modulate challenging protein-protein interactions. However, current structure-based methods are largely limited to natural peptides and are not suitable for designing target-specific binders with improved pharmaceutical properties, such as macrocyclic peptides. Here we report a general framework that leverages the computational power of Rosetta for large-scale backbone sampling and energy scoring, followed by side-chain composition, to design heterochiral cyclic peptides that bind to a protein surface of interest. To showcase the applicability of our approach, we identified two peptides (PD-i3 and PD-i6) that target PD-1, a key immune checkpoint, and work as protein ligand decoys. A comprehensive biophysical evaluation confirmed their binding mechanism to PD-1 and their inhibitory effect on the PD-1/PD-L1 interaction. Finally, elucidation of their solution structures by NMR served as validation of our de novo design approach. We anticipate that our results will provide a general framework for designing target-specific drug-like peptides.

Download Full-text