scholarly journals A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Ilaria Piazza ◽  
Nigel Beaton ◽  
Roland Bruderer ◽  
Thomas Knobloch ◽  
Crystel Barbisan ◽  
...  
2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ryan Feehan ◽  
Meghan W. Franklin ◽  
Joanna S. G. Slusky

AbstractMetalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic  metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.


2021 ◽  
Author(s):  
Nigel Beaton ◽  
Yuehan Feng ◽  
Roland Bruderer ◽  
Adam Hendricks ◽  
Ghaith Hamza ◽  
...  

2020 ◽  
Author(s):  
Ben Geoffrey A S ◽  
Pavan Preetham Valluri ◽  
Akhil Sanker ◽  
Rafal Madaj ◽  
Host Antony Davidd ◽  
...  

<p>Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated <i>In Silico</i> modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository </p> <p><a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a></p>


2015 ◽  
Vol 11 (12) ◽  
pp. 3362-3377 ◽  
Author(s):  
Vinay Randhawa ◽  
Anil Kumar Singh ◽  
Vishal Acharya

Network-based and cheminformatics approaches identify novel lead molecules forCXCR4, a key gene prioritized in oral cancer.


2018 ◽  
Author(s):  
Eric J. Lachacz ◽  
Zhi Fen Wu ◽  
John L. Bixby ◽  
Vance P. Lemmon ◽  
Sofia D. Merajver ◽  
...  

Molecules ◽  
2020 ◽  
Vol 25 (22) ◽  
pp. 5277
Author(s):  
Lauv Patel ◽  
Tripti Shukla ◽  
Xiuzhen Huang ◽  
David W. Ussery ◽  
Shanzhi Wang

The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 5043-5043
Author(s):  
Andrew W Hahn ◽  
Edwin Lin ◽  
John Esther ◽  
Neysi Anderson ◽  
Nityam Rathi ◽  
...  

5043 Background: mCRPC carries a poor prognosis, and targeted therapies have had minimal success in mCRPC. Novel genomic targets could improve drug development. To date, large ctDNA studies in metastatic prostate cancer have been descriptive with limited or no clinical annotation. Herein, we hypothesize that profiles of genomic alterations (GAs) in ctDNA not only differ significantly between, but can also be used to predict mCRPC vs. mHSPC. These findings could help identify new drug targets for mCRPC treatment. Methods: Men with mHSPC or mCRPC who underwent NGS of ctDNA using G360 (Guardant Health Inc.) at the Huntsman Cancer Institute were included. Men were classified as mCRPC or mHSPC (patients with current or no prior ADT). G360 detects somatic mutations in selected exons of 73 genes, amplifications in 18 genes, and selected fusions in 6 genes. Two-sided students t-test was used to compare the %cfDNA and total GAs. The Chi squared test was used to compare the frequency of each GA. Machine learning (ML) algorithms were trained on GAs and benchmarked by cross-validated performance. GAs contributing to mCRPC vs. mHSPC classification were measured by ML feature importance (e.g. odds ratios, regression coefficients). Results: Of the 259 men included, 119 men had mHSPC and 140 had mCRPC. Men with mCRPC had more GAs (4.5 vs. 1.86, p<0.0001) and higher %cfDNA (9.56% vs. 5.02%, p=0.02). In mHSPC, there was no significant difference in the number of GAs or %cfDNA between men on ADT and those who hadn’t yet started ADT. ML algorithms used GAs to predict mCRPC with 78.1% sensitivity, 64.0% specificity, 76.7% PPV, 65.1% NPV, and 70.3% overall accuracy. mCRPC was enriched with GAs in AR, ARID1A, BRAF, BRCA2, CCNE1, CTNNB1, EGFR, FGFR1, KIT, MET, MYC, PDGFRB, PIK3CA, and TP53. Of note, many of these genes are involved in MAP/ERK signaling. Conclusions: Men with mCRPC have more GAs, higher %cfDNA, and enrichment of GAs in the MAP/ERK pathway compared to men with mHSPC. The distinct GAs seen in mCRPC represent novel therapeutic targets, especially in the MAP/ERK pathway. We also show that machine learning can differentiate mHSPC and mCRPC based on GAs detected in ctDNA.


Sign in / Sign up

Export Citation Format

Share Document