On the use of machine learning methods for modern drug discovery

The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against Plasmodium falciparum in vitro and in vivo and that have been suggested to act through the inhibition of PfATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na+ concentration. The structure of PfATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-PfATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.

Download Full-text

Machine Learning Methods in Chemoinformatics for Drug Discovery

Practical Chemoinformatics ◽

10.1007/978-81-322-1780-0_3 ◽

2014 ◽

pp. 133-194 ◽

Cited By ~ 6

Author(s):

Muthukumarasamy Karthikeyan ◽

Renu Vyas

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

A survey on adverse drug reaction studies: data, tasks and machine learning methods

Briefings in Bioinformatics ◽

10.1093/bib/bbz140 ◽

2019 ◽

Cited By ~ 1

Author(s):

Duc Anh Nguyen ◽

Canh Hao Nguyen ◽

Hiroshi Mamitsuka

Keyword(s):

Machine Learning ◽

Adverse Drug Reaction ◽

Drug Reaction ◽

Drug Discovery ◽

Data Sources ◽

Mechanism Analysis ◽

Learning Methods ◽

Open Problems ◽

Machine Learning Methods ◽

Benchmark Data

Abstract Motivation Adverse drug reaction (ADR) or drug side effect studies play a crucial role in drug discovery. Recently, with the rapid increase of both clinical and non-clinical data, machine learning methods have emerged as prominent tools to support analyzing and predicting ADRs. Nonetheless, there are still remaining challenges in ADR studies. Results In this paper, we summarized ADR data sources and review ADR studies in three tasks: drug-ADR benchmark data creation, drug–ADR prediction and ADR mechanism analysis. We focused on machine learning methods used in each task and then compare performances of the methods on the drug–ADR prediction task. Finally, we discussed open problems for further ADR studies. Availability Data and code are available at https://github.com/anhnda/ADRPModels.

Download Full-text

Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets

Molecular Pharmaceutics ◽

10.1021/acs.molpharmaceut.7b00578 ◽

2017 ◽

Vol 14 (12) ◽

pp. 4462-4475 ◽

Cited By ~ 92

Author(s):

Alexandru Korotcov ◽

Valery Tkachenko ◽

Daniel P. Russo ◽

Sean Ekins

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Discovery ◽

Data Sets ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Application of network link prediction in drug discovery

BMC Bioinformatics ◽

10.1186/s12859-021-04082-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Khushnood Abbas ◽

Alireza Abbasi ◽

Shi Dong ◽

Ling Niu ◽

Laihang Yu ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Link Prediction ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Biomedical Data ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Wide Range

Abstract Background Technological and research advances have produced large volumes of biomedical data. When represented as a network (graph), these data become useful for modeling entities and interactions in biological and similar complex systems. In the field of network biology and network medicine, there is a particular interest in predicting results from drug–drug, drug–disease, and protein–protein interactions to advance the speed of drug discovery. Existing data and modern computational methods allow to identify potentially beneficial and harmful interactions, and therefore, narrow drug trials ahead of actual clinical trials. Such automated data-driven investigation relies on machine learning techniques. However, traditional machine learning approaches require extensive preprocessing of the data that makes them impractical for large datasets. This study presents wide range of machine learning methods for predicting outcomes from biomedical interactions and evaluates the performance of the traditional methods with more recent network-based approaches. Results We applied a wide range of 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on three important evaluations metrics namely AUROC, AUPR, and F1-score. We achieved this by converting link prediction problem as binary classification problem. In order to achieve this we have considered the existing links as positive example and randomly sampled negative examples from non-existant set. After experimental evaluation we found that Prone, ACT and $$LRW_5$$ L R W 5 are the top 3 best performers on all five datasets. Conclusions This work presents a comparative evaluation of network-based machine learning algorithms for predicting network links, with applications in the prediction of drug-target and drug–drug interactions, and applied well known network-based machine learning methods. Our work is helpful in guiding researchers in the appropriate selection of machine learning methods for pharmaceutical tasks.

Download Full-text

Machine Learning Methods in Antiviral Drug Discovery

10.1007/7355_2021_121 ◽

2021 ◽

pp. 245-279

Author(s):

Olga A. Tarasova ◽

Anastasia V. Rudik ◽

Sergey M. Ivanov ◽

Alexey A. Lagunin ◽

Vladimir V. Poroikov ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Antiviral Drug ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241.v1 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials

10.26434/chemrxiv.13194755.v1 ◽

2020 ◽

Author(s):

Edwin Tse ◽

Laksh Aithani ◽

Mark Anderson ◽

Jonathan Cardoso-Silva ◽

Giovanni Cincilla ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Predictive Models ◽

Structural Information ◽

Open Data ◽

Ion Pump ◽

Learning Methods ◽

Machine Learning Methods

The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against Plasmodium falciparum in vitro and in vivo and that have been suggested to act through the inhibition of PfATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na+ concentration. The structure of PfATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-PfATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.

Download Full-text

Advanced Interpretable Machine Learning Methods for Clinical NGS Big Data of Complex Hereditary Diseases

10.3389/978-2-88966-274-6 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Big Data ◽

Hereditary Diseases ◽

Learning Methods ◽

Machine Learning Methods ◽

Interpretable Machine Learning

Download Full-text