scholarly journals Machine learning prediction of oncology drug targets based on protein and network properties

2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


2020 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


2020 ◽  
Author(s):  
Joseph Giorgio ◽  
William J Jagust ◽  
Suzanne Baker ◽  
Susan M. Landau ◽  
Peter Tino ◽  
...  

AbstractThe earliest stages of Alzheimer’s disease (AD) involve interactions between multiple pathophysiological processes. Although these processes are well studied, we still lack robust tools to predict individualised trajectories of disease progression. Here, we employ a robust and interpretable machine learning approach to combine multimodal biological data and predict future tau accumulation, translating predictive information from deep phenotyping cohorts at early stages of AD to cognitively normal individuals. In particular, we use machine learning to quantify interactions between key pathological markers (β-amyloid, medial temporal atrophy, tau and APOE 4) at early and asymptomatic stages of AD. We next derive a predictive index that stratifies individuals based on future pathological tau accumulation, highlighting two critical features for optimal clinical trial design. First, future tau accumulation provides a better outcome measure compared to changes in cognition. Second, stratification based on multimodal data compared to β-amyloid alone reduces the sample size required to detect a clinically meaningful change in tau accumulation. Further, we extend our machine learning approach to derive individualised trajectories of future pathological tau accumulation in early AD patients and accurately predict regional future rate of tau accumulation in an independent sample of cognitively unimpaired individuals. Our results propose a robust approach for fine scale stratification and prognostication with translation impact for clinical trial design at asymptomatic and early stages of AD.One Sentence SummaryOur machine learning approach combines baseline multimodal data to make individualised predictions of future pathological tau accumulation at prodromal and asymptomatic stages of Alzheimer’s disease with high accuracy and regional specificity.


2019 ◽  
Author(s):  
Coryandar Gilvary ◽  
Neel S. Madhukar ◽  
Kaitlyn Gayvert ◽  
Miguel Foronda ◽  
Alexendar Perez ◽  
...  

ABSTRACTLoss-of-function (LoF) screenings have the potential to reveal novel cancer-specific vulnerabilities, prioritize drug treatments, and inform precision medicine therapeutics. These screenings were traditionally done using shRNAs, but with the recent emergence of CRISPR technology there has been a shift in methodology. However, recent analyses have found large inconsistencies between CRISPR and shRNA essentiality results. Here, we examined the DepMap project, the largest cancer LoF effort undertaken to date, and find a lack of correlation between CRISPR and shRNA LoF results; we further characterized differences between genes found to be essential by either platform. We then introduce ECLIPSE, a machine learning approach, which combines genomic, cell line, and experimental design features to predict essential genes and platform specific essential genes in specific cancer cell lines. We applied ECLIPSE to known drug targets and found that our approach strongly differentiated drugs approved for cancer versus those that have not, and can thus be leveraged to identify potential cancer repurposing opportunities. Overall, ECLIPSE allows for a more comprehensive analysis of gene essentiality and drug development; which neither platform can achieve alone.


Methods ◽  
2015 ◽  
Vol 74 ◽  
pp. 65-70 ◽  
Author(s):  
Weixiang Shao ◽  
Clive E. Adams ◽  
Aaron M. Cohen ◽  
John M. Davis ◽  
Marian S. McDonagh ◽  
...  

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Neel S. Madhukar ◽  
Prashant K. Khade ◽  
Linda Huang ◽  
Kaitlyn Gayvert ◽  
Giuseppe Galletti ◽  
...  

AbstractDrug target identification is a crucial step in development, yet is also among the most complex. To address this, we develop BANDIT, a Bayesian machine-learning approach that integrates multiple data types to predict drug binding targets. Integrating public data, BANDIT benchmarked a ~90% accuracy on 2000+ small molecules. Applied to 14,000+ compounds without known targets, BANDIT generated ~4,000 previously unknown molecule-target predictions. From this set we validate 14 novel microtubule inhibitors, including 3 with activity on resistant cancer cells. We applied BANDIT to ONC201—an anti-cancer compound in clinical development whose target had remained elusive. We identified and validated DRD2 as ONC201’s target, and this information is now being used for precise clinical trial design. Finally, BANDIT identifies connections between different drug classes, elucidating previously unexplained clinical observations and suggesting new drug repositioning opportunities. Overall, BANDIT represents an efficient and accurate platform to accelerate drug discovery and direct clinical application.


2020 ◽  
Author(s):  
Joseph Giorgio ◽  
William Jagust ◽  
Suzanne Baker ◽  
Susan Landau ◽  
Peter Tino ◽  
...  

Abstract The earliest stages of Alzheimer’s disease (AD) involve interactions between multiple pathophysiological processes. Although these processes are well studied, we still lack robust tools to predict individualised trajectories of disease progression. Here, we employ a robust and interpretable machine learning approach to combine multimodal biological data and predict future tau accumulation, translating predictive information from deep phenotyping cohorts at early stages of AD to cognitively normal individuals. In particular, we use machine learning to quantify interactions between key pathological markers (β-amyloid, medial temporal atrophy, tau and APOE 4) at early and asymptomatic stages of AD. We next derive a predictive index that stratifies individuals based on future pathological tau accumulation, highlighting two critical features for optimal clinical trial design. First, future tau accumulation provides a better outcome measure compared to changes in cognition. Second, stratification based on multimodal data compared to β-amyloid alone reduces the sample size required to detect a clinically meaningful change in tau accumulation. Further, we extend our machine learning approach to derive individualised trajectories of future pathological tau accumulation in early AD patients and accurately predict regional future rate of tau accumulation in an independent sample of cognitively unimpaired individuals. Our results propose a robust approach for fine scale stratification and prognostication with translation impact for clinical trial design at asymptomatic and early stages of AD.


Sign in / Sign up

Export Citation Format

Share Document