On the use of machine learning methods for modern drug discovery

2011 ◽  
Vol 24 (1) ◽  
pp. 99-100
Author(s):  
Axel J. Soto
2020 ◽  
Author(s):  
Edwin Tse ◽  
Laksh Aithani ◽  
Mark Anderson ◽  
Jonathan Cardoso-Silva ◽  
Giovanni Cincilla ◽  
...  

<p>The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against <i>Plasmodium falciparum</i> <i>in vitro</i> and <i>in vivo</i> and that have been suggested to act through the inhibition of <i>Pf</i>ATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na<sup>+</sup> concentration. The structure of <i>Pf</i>ATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-<i>Pf</i>ATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.</p>In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.


2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


Author(s):  
Duc Anh Nguyen ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka

Abstract Motivation Adverse drug reaction (ADR) or drug side effect studies play a crucial role in drug discovery. Recently, with the rapid increase of both clinical and non-clinical data, machine learning methods have emerged as prominent tools to support analyzing and predicting ADRs. Nonetheless, there are still remaining challenges in ADR studies. Results In this paper, we summarized ADR data sources and review ADR studies in three tasks: drug-ADR benchmark data creation, drug–ADR prediction and ADR mechanism analysis. We focused on machine learning methods used in each task and then compare performances of the methods on the drug–ADR prediction task. Finally, we discussed open problems for further ADR studies. Availability Data and code are available at https://github.com/anhnda/ADRPModels.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Khushnood Abbas ◽  
Alireza Abbasi ◽  
Shi Dong ◽  
Ling Niu ◽  
Laihang Yu ◽  
...  

Abstract Background Technological and research advances have produced large volumes of biomedical data. When represented as a network (graph), these data become useful for modeling entities and interactions in biological and similar complex systems. In the field of network biology and network medicine, there is a particular interest in predicting results from drug–drug, drug–disease, and protein–protein interactions to advance the speed of drug discovery. Existing data and modern computational methods allow to identify potentially beneficial and harmful interactions, and therefore, narrow drug trials ahead of actual clinical trials. Such automated data-driven investigation relies on machine learning techniques. However, traditional machine learning approaches require extensive preprocessing of the data that makes them impractical for large datasets. This study presents wide range of machine learning methods for predicting outcomes from biomedical interactions and evaluates the performance of the traditional methods with more recent network-based approaches. Results We applied a wide range of 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on three important evaluations metrics namely AUROC, AUPR, and F1-score. We achieved this by converting link prediction problem as binary classification problem. In order to achieve this we have considered the existing links as positive example and randomly sampled negative examples from non-existant set. After experimental evaluation we found that Prone, ACT and $$LRW_5$$ L R W 5 are the top 3 best performers on all five datasets. Conclusions This work presents a comparative evaluation of network-based machine learning algorithms for predicting network links, with applications in the prediction of drug-target and drug–drug interactions, and applied well known network-based machine learning methods. Our work is helpful in guiding researchers in the appropriate selection of machine learning methods for pharmaceutical tasks.


2021 ◽  
pp. 245-279
Author(s):  
Olga A. Tarasova ◽  
Anastasia V. Rudik ◽  
Sergey M. Ivanov ◽  
Alexey A. Lagunin ◽  
Vladimir V. Poroikov ◽  
...  

2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


2020 ◽  
Author(s):  
Edwin Tse ◽  
Laksh Aithani ◽  
Mark Anderson ◽  
Jonathan Cardoso-Silva ◽  
Giovanni Cincilla ◽  
...  

<p>The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against <i>Plasmodium falciparum</i> <i>in vitro</i> and <i>in vivo</i> and that have been suggested to act through the inhibition of <i>Pf</i>ATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na<sup>+</sup> concentration. The structure of <i>Pf</i>ATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-<i>Pf</i>ATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.</p>In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.


Sign in / Sign up

Export Citation Format

Share Document