Abstract 21: LiP-MS, a machine learning-based chemoproteomic approach to identify drug targets in complex proteomes

Author(s):  
Nigel Beaton ◽  
Yuehan Feng ◽  
Roland Bruderer ◽  
Adam Hendricks ◽  
Ghaith Hamza ◽  
...  
2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2020 ◽  
Author(s):  
Ben Geoffrey A S ◽  
Pavan Preetham Valluri ◽  
Akhil Sanker ◽  
Rafal Madaj ◽  
Host Antony Davidd ◽  
...  

<p>Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated <i>In Silico</i> modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository </p> <p><a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a></p>


2015 ◽  
Vol 11 (12) ◽  
pp. 3362-3377 ◽  
Author(s):  
Vinay Randhawa ◽  
Anil Kumar Singh ◽  
Vishal Acharya

Network-based and cheminformatics approaches identify novel lead molecules forCXCR4, a key gene prioritized in oral cancer.


2018 ◽  
Author(s):  
Eric J. Lachacz ◽  
Zhi Fen Wu ◽  
John L. Bixby ◽  
Vance P. Lemmon ◽  
Sofia D. Merajver ◽  
...  

Molecules ◽  
2020 ◽  
Vol 25 (22) ◽  
pp. 5277
Author(s):  
Lauv Patel ◽  
Tripti Shukla ◽  
Xiuzhen Huang ◽  
David W. Ussery ◽  
Shanzhi Wang

The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 5043-5043
Author(s):  
Andrew W Hahn ◽  
Edwin Lin ◽  
John Esther ◽  
Neysi Anderson ◽  
Nityam Rathi ◽  
...  

5043 Background: mCRPC carries a poor prognosis, and targeted therapies have had minimal success in mCRPC. Novel genomic targets could improve drug development. To date, large ctDNA studies in metastatic prostate cancer have been descriptive with limited or no clinical annotation. Herein, we hypothesize that profiles of genomic alterations (GAs) in ctDNA not only differ significantly between, but can also be used to predict mCRPC vs. mHSPC. These findings could help identify new drug targets for mCRPC treatment. Methods: Men with mHSPC or mCRPC who underwent NGS of ctDNA using G360 (Guardant Health Inc.) at the Huntsman Cancer Institute were included. Men were classified as mCRPC or mHSPC (patients with current or no prior ADT). G360 detects somatic mutations in selected exons of 73 genes, amplifications in 18 genes, and selected fusions in 6 genes. Two-sided students t-test was used to compare the %cfDNA and total GAs. The Chi squared test was used to compare the frequency of each GA. Machine learning (ML) algorithms were trained on GAs and benchmarked by cross-validated performance. GAs contributing to mCRPC vs. mHSPC classification were measured by ML feature importance (e.g. odds ratios, regression coefficients). Results: Of the 259 men included, 119 men had mHSPC and 140 had mCRPC. Men with mCRPC had more GAs (4.5 vs. 1.86, p<0.0001) and higher %cfDNA (9.56% vs. 5.02%, p=0.02). In mHSPC, there was no significant difference in the number of GAs or %cfDNA between men on ADT and those who hadn’t yet started ADT. ML algorithms used GAs to predict mCRPC with 78.1% sensitivity, 64.0% specificity, 76.7% PPV, 65.1% NPV, and 70.3% overall accuracy. mCRPC was enriched with GAs in AR, ARID1A, BRAF, BRCA2, CCNE1, CTNNB1, EGFR, FGFR1, KIT, MET, MYC, PDGFRB, PIK3CA, and TP53. Of note, many of these genes are involved in MAP/ERK signaling. Conclusions: Men with mCRPC have more GAs, higher %cfDNA, and enrichment of GAs in the MAP/ERK pathway compared to men with mHSPC. The distinct GAs seen in mCRPC represent novel therapeutic targets, especially in the MAP/ERK pathway. We also show that machine learning can differentiate mHSPC and mCRPC based on GAs detected in ctDNA.


2021 ◽  
Author(s):  
Elisabetta Manduchi ◽  
Trang T. Le ◽  
Weixuan Fu ◽  
Jason H. Moore

AbstractMachine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.


2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


Sign in / Sign up

Export Citation Format

Share Document