scholarly journals A Bayesian machine learning approach for drug target identification using diverse data types

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Neel S. Madhukar ◽  
Prashant K. Khade ◽  
Linda Huang ◽  
Kaitlyn Gayvert ◽  
Giuseppe Galletti ◽  
...  

AbstractDrug target identification is a crucial step in development, yet is also among the most complex. To address this, we develop BANDIT, a Bayesian machine-learning approach that integrates multiple data types to predict drug binding targets. Integrating public data, BANDIT benchmarked a ~90% accuracy on 2000+ small molecules. Applied to 14,000+ compounds without known targets, BANDIT generated ~4,000 previously unknown molecule-target predictions. From this set we validate 14 novel microtubule inhibitors, including 3 with activity on resistant cancer cells. We applied BANDIT to ONC201—an anti-cancer compound in clinical development whose target had remained elusive. We identified and validated DRD2 as ONC201’s target, and this information is now being used for precise clinical trial design. Finally, BANDIT identifies connections between different drug classes, elucidating previously unexplained clinical observations and suggesting new drug repositioning opportunities. Overall, BANDIT represents an efficient and accurate platform to accelerate drug discovery and direct clinical application.

2019 ◽  
Author(s):  
Sheng Wang ◽  
Jianzhu Ma ◽  
Samson Fong ◽  
Stefano Rensi ◽  
Jiawei Han ◽  
...  

ABSTRACTGene functional enrichment is a mainstay of genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of the biological context. Here we present an alternative machine learning approach, Deep Functional Synthesis (DeepSyn), which moves beyond gene function databases to dynamically infer the functions of a gene set from its associated network of literature and data, conditioned on the disease and drug context of the current experiment. Using a knowledge graph with 3,048,803 associations between genes, diseases, drugs, and functions, DeepSyn obtained accurate performance (range 0.74 AUC to 0.96 AUC) on a variety of biological applications including drug target identification, gene set functional enrichment, and disease gene prediction.AvailabilityThe DeepSyn codebase is available on GitHub at http://github.com/wangshenguiuc/DeepSyn/ under an open source distribution license.


2021 ◽  
Vol 22 (10) ◽  
pp. 5118
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.


2017 ◽  
Author(s):  
Neel S. Madhukar ◽  
Prashant K. Khade ◽  
Linda Huang ◽  
Kaitlyn Gayvert ◽  
Giuseppe Galletti ◽  
...  

AbstractDrug target identification is one of the most important aspects of pre-clinical development yet it is also among the most complex, labor-intensive, and costly. This represents a major issue, as lack of proper target identification can be detrimental in determining the clinical application of a bioactive small molecule. To improve target identification, we developed BANDIT, a novel paradigm that integrates multiple data types within a Bayesian machine-learning framework to predict the targets and mechanisms for small molecules with unprecedented accuracy and versatility. Using only public data BANDIT achieved an accuracy of approximately 90% over 2000 different small molecules – substantially better than any other published target identification platform. We applied BANDIT to a library of small molecules with no known targets and generated ∼4,000 novel molecule-target predictions. From this set we identified and experimentally validated a set of novel microtubule inhibitors, including three with activity on cancer cells resistant to clinically used anti-microtubule therapies. We next applied BANDIT to ONC201 – an active anti- cancer small molecule in clinical development – whose target has remained elusive since its discovery in 2009. BANDIT identified dopamine receptor 2 as the unexpected target of ONC201, a prediction that we experimentally validated. Not only does this open the door for clinical trials focused on target-based selection of patient populations, but it also represents a novel way to target GPCRs in cancer. Additionally, BANDIT identified previously undocumented connections between approved drugs with disparate indications, shedding light onto previously unexplained clinical observations and suggesting new uses of marketed drugs. Overall, BANDIT represents an efficient and highly accurate platform that can be used as a resource to accelerate drug discovery and direct the clinical application of small molecule therapeutics with improved precision.


2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


2019 ◽  
Vol 20 (3) ◽  
pp. 209-216 ◽  
Author(s):  
Yang Hu ◽  
Tianyi Zhao ◽  
Ningyi Zhang ◽  
Ying Zhang ◽  
Liang Cheng

Background:From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue.Methods:We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail.Results:Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved.Conclusion:The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods


2021 ◽  
Vol 73 (09) ◽  
pp. 44-45
Author(s):  
Chris Carpenter

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 201698, “Finding a Trend Out of Chaos: A Machine-Learning Approach for Well-Spacing Optimization,” by Zheren Ma, Ehsan Davani, SPE, and Xiaodan Ma, SPE, Quantum Reservoir Impact, et al., prepared for the 2020 SPE Annual Technical Conference and Exhibition, originally scheduled to be held in Denver, Colorado, 5–7 October. The paper has not been peer reviewed. Data-driven decisions powered by machine-learning (ML) methods are increasing in popularity when optimizing field development in unconventional reservoirs. However, because well performance is affected by many factors, the challenge is to uncover trends within all the noise. By leveraging basin-level knowledge captured by big data sculpting, integrating private and public data with the use of uncertainty quantification, a process the authors describe as augmented artificial intelligence (AI) can provide quick, science-based answers for well spacing and fracturing optimization and can assess the full potential of an asset in unconventional reservoirs. A case study in the Midland Basin is detailed in the complete paper. Introduction Augmented AI is a process wherein ML and human expertise are coupled to improve solutions. The augmented AI work flow (Fig. 1) starts with data sculpting, which includes information retrieval; data cleaning and standardization; and smart, deep, and systematic data quality control (QC). Feature engineering generates all relevant parameters entering the ML model. More than 50 features have been generated for this work and categorized. The final step is to perform model tuning and ensemble, evaluating model robustness and generating model explanation and uncertainty quantification. Geology The complete paper provides a detailed geological background of the Permian Basin and its Wolfcamp unconventional layer, an organic-rich shale formation with tight reservoir properties. To find a solution for the multidimensional well-spacing problem in the Permian Basin, multiple sources and types of data were gathered using publicly available sources. The detailed geological attributes, including structure, petrophysics, geochemistry, basin-level features, and cultural information (such as counties or lease boundaries) have been combined in an integrated database to extract and generate features for the ML algorithm. Most attributes are available either in a limited number of wells, mostly vertical, or through the low number of available cored wells across the basin. Therefore, a significant amount of data imputation has been processed with mapping exercises using geostatistical modeling techniques. The mapping process augmented the ML attribute-generation step because these features were distributed in both vertical and lateral dimensions. All horizontal wells within the area of interest across the Permian Basin have been resampled with the logged and mapped information. The geological features also are reengineered into multiple indices to reduce the number of labeled features to include in the ML process. This feature-reduction process also has helped in ranking and selecting the most-important parameters relevant to the well-spacing problem. Here, a key attribute called the shale-oil index was introduced, which is generated for the ML-driven process and is used in understanding the level of contribution of geological sweet spots to well-spacing optimization. In addition, the initial well, reservoir, or laboratory data, including logs, have been normalized before mapping and modeling to eliminate potential bias. This study has focused on Wolfcamp layers; however, both geological and engineering attribute generation work flows used for this practical ML methodology to find optimization solutions for common problems are highly applicable to other unconventional layers, such as Bone Spring or Spraberry.


2019 ◽  
Author(s):  
Zoltan Dezso ◽  
Michele Ceccarelli

Abstract Background The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. Results We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an AUC of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. Conclusions We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.


Sign in / Sign up

Export Citation Format

Share Document