Abstract 4647: Identifying drug targets in sarcoma using machine learning and cell phenotype-based compound screening

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.

Download Full-text

Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host‐pathogen Interaction Network Parameters

Molecular Informatics ◽

10.1002/minf.202100115 ◽

2021 ◽

pp. 2100115

Author(s):

Nirupma Singh ◽

Sonika Bhatnagar

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Drug Targets ◽

Interaction Network ◽

Host Pathogen Interaction ◽

Network Parameters ◽

Host Pathogen

Download Full-text

Abstract 21: LiP-MS, a machine learning-based chemoproteomic approach to identify drug targets in complex proteomes

10.1158/1538-7445.am2021-21 ◽

2021 ◽

Author(s):

Nigel Beaton ◽

Yuehan Feng ◽

Roland Bruderer ◽

Adam Hendricks ◽

Ghaith Hamza ◽

...

Keyword(s):

Machine Learning ◽

Drug Targets

Download Full-text

Compound2Drug – a Machine/deep Learning Tool for Predicting the Bioactivity of PubChem Compounds

10.26434/chemrxiv.13052951 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Pavan Preetham Valluri ◽

Akhil Sanker ◽

Rafal Madaj ◽

Host Antony Davidd ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Molecular Docking ◽

Drug Target ◽

Drug Targets ◽

Learning Algorithms ◽

Network Data ◽

Ligand Interaction ◽

Pubchem Compound ◽

Protein Ligand Interaction

Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated In Silico modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository <a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a>

Download Full-text

The Application of Machine Learning Techniques in Protein Drugs and Drug Targets Recognition

Current Drug Metabolism ◽

10.2174/138920022003190424105144 ◽

2019 ◽

Vol 20 (3) ◽

pp. 168-169

Author(s):

Hui Ding

Keyword(s):

Machine Learning ◽

Drug Targets ◽

Machine Learning Techniques ◽

Protein Drugs ◽

Learning Techniques

Download Full-text

A systematic approach to prioritize drug targets using machine learning, a molecular descriptor-based classification model, and high-throughput screening of plant derived molecules: a case study in oral cancer

Molecular BioSystems ◽

10.1039/c5mb00468c ◽

2015 ◽

Vol 11 (12) ◽

pp. 3362-3377 ◽

Cited By ~ 4

Author(s):

Vinay Randhawa ◽

Anil Kumar Singh ◽

Vishal Acharya

Keyword(s):

Machine Learning ◽

Oral Cancer ◽

High Throughput ◽

High Throughput Screening ◽

Drug Targets ◽

Systematic Approach ◽

Molecular Descriptor ◽

Classification Model

Network-based and cheminformatics approaches identify novel lead molecules forCXCR4, a key gene prioritized in oral cancer.

Download Full-text

Machine Learning Methods in Drug Discovery

Molecules ◽

10.3390/molecules25225277 ◽

2020 ◽

Vol 25 (22) ◽

pp. 5277

Author(s):

Lauv Patel ◽

Tripti Shukla ◽

Xiuzhen Huang ◽

David W. Ussery ◽

Shanzhi Wang

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Discovery ◽

High Throughput Screening ◽

Drug Targets ◽

Learning Algorithms ◽

Machine Learning Techniques ◽

Online Information ◽

Drug Candidates ◽

Novel Drug

The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.

Download Full-text

Genomic landscape of metastatic hormone sensitive prostate cancer (mHSPC) vs. metastatic castration-refractory prostate cancer (mCRPC) by circulating tumor DNA (ctDNA).

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.5043 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 5043-5043

Author(s):

Andrew W Hahn ◽

Edwin Lin ◽

John Esther ◽

Neysi Anderson ◽

Nityam Rathi ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Drug Targets ◽

Circulating Tumor Dna ◽

Erk Pathway ◽

Genomic Landscape ◽

Significant Difference ◽

Chi Squared ◽

Clinical Annotation ◽

Hormone Sensitive

5043 Background: mCRPC carries a poor prognosis, and targeted therapies have had minimal success in mCRPC. Novel genomic targets could improve drug development. To date, large ctDNA studies in metastatic prostate cancer have been descriptive with limited or no clinical annotation. Herein, we hypothesize that profiles of genomic alterations (GAs) in ctDNA not only differ significantly between, but can also be used to predict mCRPC vs. mHSPC. These findings could help identify new drug targets for mCRPC treatment. Methods: Men with mHSPC or mCRPC who underwent NGS of ctDNA using G360 (Guardant Health Inc.) at the Huntsman Cancer Institute were included. Men were classified as mCRPC or mHSPC (patients with current or no prior ADT). G360 detects somatic mutations in selected exons of 73 genes, amplifications in 18 genes, and selected fusions in 6 genes. Two-sided students t-test was used to compare the %cfDNA and total GAs. The Chi squared test was used to compare the frequency of each GA. Machine learning (ML) algorithms were trained on GAs and benchmarked by cross-validated performance. GAs contributing to mCRPC vs. mHSPC classification were measured by ML feature importance (e.g. odds ratios, regression coefficients). Results: Of the 259 men included, 119 men had mHSPC and 140 had mCRPC. Men with mCRPC had more GAs (4.5 vs. 1.86, p<0.0001) and higher %cfDNA (9.56% vs. 5.02%, p=0.02). In mHSPC, there was no significant difference in the number of GAs or %cfDNA between men on ADT and those who hadn’t yet started ADT. ML algorithms used GAs to predict mCRPC with 78.1% sensitivity, 64.0% specificity, 76.7% PPV, 65.1% NPV, and 70.3% overall accuracy. mCRPC was enriched with GAs in AR, ARID1A, BRAF, BRCA2, CCNE1, CTNNB1, EGFR, FGFR1, KIT, MET, MYC, PDGFRB, PIK3CA, and TP53. Of note, many of these genes are involved in MAP/ERK signaling. Conclusions: Men with mCRPC have more GAs, higher %cfDNA, and enrichment of GAs in the MAP/ERK pathway compared to men with mHSPC. The distinct GAs seen in mCRPC represent novel therapeutic targets, especially in the MAP/ERK pathway. We also show that machine learning can differentiate mHSPC and mCRPC based on GAs detected in ctDNA.

Download Full-text

Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection

10.1101/2021.03.23.436652 ◽

2021 ◽

Author(s):

Elisabetta Manduchi ◽

Trang T. Le ◽

Weixuan Fu ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Coronary Artery Disease ◽

Coronary Artery ◽

Drug Targets ◽

Nucleotide Polymorphisms ◽

Individual Level ◽

Importance Analysis ◽

Artery Disease ◽

The Uk ◽

The Right

AbstractMachine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.

Download Full-text

Disease-relevant single cell photonic signatures identify S100β stem cells and their myogenic progeny in vascular lesions

10.1101/2020.05.13.093518 ◽

2020 ◽

Author(s):

Claire Molony ◽

Damien King ◽

Mariana Di Luca ◽

Abidemi Olayinka ◽

Roya Hakimjavadi ◽

...

Keyword(s):

Machine Learning ◽

Stem Cells ◽

Single Cell ◽

Ex Vivo ◽

Lineage Tracing ◽

Supervised Machine Learning ◽

Vascular Lesions ◽

Cell Phenotype ◽

Genetic Lineage ◽

Collagen Iii

AbstractA hallmark of subclinical atherosclerosis is the accumulation of vascular smooth muscle cell (SMC)-like cells leading to intimal thickening and lesion formation. While medial SMCs contribute to vascular lesions, the involvement of resident vascular stem cells (vSCs) remains unclear. We evaluated single cell photonics as a discriminator of cell phenotype in vitro before the presence of vSC within vascular lesions was assessed ex vivo using supervised machine learning and further validated using lineage tracing analysis. Using a novel lab-on-a-Disk (Load) platform, label-free single cell photonic emissions from normal and injured vessels ex vivo were interrogated and compared to freshly isolated aortic SMCs, cultured Movas SMCs, macrophages, B-cells, S100β+ mVSc, bone marrow derived mesenchymal stem cells (MSC) and their respective myogenic progeny across five broadband light wavelengths (λ465 - λ670 ± 20 nm). We found that profiles were of sufficient coverage, specificity, and quality to clearly distinguish medial SMCs from different vascular beds (carotid vs aorta), discriminate normal carotid medial SMCs from lesional SMC-like cells ex vivo following flow restriction, and identify SMC differentiation of a series of multipotent stem cells following treatment with transforming growth factor beta 1 (TGF-β1), the Notch ligand Jagged1, and Sonic Hedgehog using multivariate analysis, in part, due to photonic emissions from enhanced collagen III and elastin expression. Supervised machine learning supported genetic lineage tracing analysis of S100β+ vSCs and identified the presence of S100β+ vSC-derived myogenic progeny within vascular lesions. We conclude disease-relevant photonic signatures may have predictive value for vascular disease.

Download Full-text