scholarly journals Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery

PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0141076 ◽  
Author(s):  
Sean Ekins ◽  
Peter B. Madrid ◽  
Malabika Sarker ◽  
Shao-Gang Li ◽  
Nisha Mittal ◽  
...  
2018 ◽  
Vol 15 (10) ◽  
pp. 4346-4360 ◽  
Author(s):  
Thomas Lane ◽  
Daniel P. Russo ◽  
Kimberley M. Zorn ◽  
Alex M. Clark ◽  
Alexandru Korotcov ◽  
...  

2020 ◽  
Author(s):  
Victor O. Gawriljuk ◽  
Phyo Phyo Kyaw Zin ◽  
Daniel H. Foil ◽  
Jean Bernatchez ◽  
Sungjun Beck ◽  
...  

AbstractWith the ongoing SARS-CoV-2 pandemic there is an urgent need for the discovery of a treatment for the coronavirus disease (COVID-19). Drug repurposing is one of the most rapid strategies for addressing this need and numerous compounds have been selected for in vitro testing by several groups already. These have led to a growing database of molecules with in vitro activity against the virus. Machine learning models can assist drug discovery through prediction of the best compounds based on previously published data. Herein we have implemented several machine learning methods to develop predictive models from recent SARS-CoV-2 in vitro inhibition data and used them to prioritize additional FDA approved compounds for in vitro testing selected from our in-house compound library. From the compounds predicted with a Bayesian machine learning model, CPI1062 and CPI1155 showed antiviral activity in HeLa-ACE2 cell-based assays and represent potential repurposing opportunities for COVID-19. This approach can be greatly expanded to exhaustively virtually screen available molecules with predicted activity against this virus as well as a prioritization tool for SARS-CoV-2 antiviral drug discovery programs. The very latest model for SARS-CoV-2 is available at www.assaycentral.org.


In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.


mSystems ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
D. Aytan-Aktug ◽  
P. T. L. C. Clausen ◽  
V. Bortolaia ◽  
F. M. Aarestrup ◽  
O. Lund

ABSTRACT Machine learning has proven to be a powerful method to predict antimicrobial resistance (AMR) without using prior knowledge for selected bacterial species-antimicrobial combinations. To date, only species-specific machine learning models have been developed, and to the best of our knowledge, the inclusion of information from multiple species has not been attempted. The aim of this study was to determine the feasibility of including information from multiple bacterial species to predict AMR for an individual species, since this may make it easier to train and update resistance predictions for multiple species and may lead to improved predictions. Whole-genome sequence data and susceptibility profiles from 3,528 Mycobacterium tuberculosis, 1,694 Escherichia coli, 658 Salmonella enterica, and 1,236 Staphylococcus aureus isolates were included. We developed machine learning models trained by the features of the PointFinder and ResFinder programs detected to predict binary (susceptible/resistant) AMR profiles. We tested four feature representation methods to determine the most efficient way for introducing features into the models. When training the model only on the Mycobacterium tuberculosis isolates, high prediction performances were obtained for the six AMR profiles included. By adding information on ciprofloxacin from the additional 3,588 isolates, there was no reduction in performance for the other antimicrobials but an increased performance for ciprofloxacin AMR profile prediction for Mycobacterium tuberculosis and Escherichia coli. In conclusion, the species-independent models can predict multi-AMR profiles for multiple species without losing any robustness. IMPORTANCE Machine learning is a proven method to predict AMR; however, the performance of any machine learning model depends on the quality of the input data. Therefore, we evaluated different methods of representing information about mutations as well as mobilizable genes, so that the information can serve as input for a robust model. We combined data from multiple bacterial species in order to develop species-independent machine learning models that can predict resistance profiles for multiple antimicrobials and species with high performance.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 145564-145576 ◽  
Author(s):  
Amirhosein Mosavi ◽  
Farzaneh Sajedi Hosseini ◽  
Bahram Choubin ◽  
Massoud Goodarzi ◽  
Adrienn A. Dineva

2020 ◽  
Author(s):  
Guo Liang Gan ◽  
Matthew Nguyen ◽  
Elijah Willie ◽  
Brian Lee ◽  
Cedric Chauve ◽  
...  

AbstractThe efficacy of antibiotic drug treatments in tuberculosis (TB) is significantly threatened by the development of drug resistance. There is a need for a robust diagnostic system that can accurately predict drug resistance in patients. In recent years, researchers have been taking advantage of whole-genome sequencing (WGS) data to infer antibiotic resistance. In this work we investigate the power of machine learning tools in inferring drug resistance from WGS data on three distinct datasets differing in their geographical diversity.We analyzed data from the Relational Sequencing TB Data Platform, which comprises global isolates from 32 different countries, the PATRIC database, containing isolates contributed by researchers around the world, and isolates collected by the British Columbia Centre for Disease Control in Canada. We predicted drug resistance to the first-line drugs: isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin. We focused on the genes which previous evidence suggests are involved in drug resistance in TB.We called single-nucleotide polymorphisms using the Snippy pipeline, then applied different machine learning models. Following best practices, we chose the best parameters for each model via cross-validation on the training set and evaluated the performance via the sensitivity-specificity tradeoffs on the testing set.To the best of our knowledge, our study is the first to predict antibiotic resistance in TB across multiple datasets. We obtained a performance comparable to that seen in previous studies, but observed that performance may be negatively affected when training on one dataset and testing on another, suggesting the importance of geographical heterogeneity in drug resistance predictions. In addition, we investigated the importance of each gene within each model, and recapitulated some previously known biology of drug resistance. This study paves the way for further investigations, with the ultimate goal of creating an accurate, interpretable and globally generalizable model for predicting drug resistance in TB.Author summaryDrug resistance in pathogenic bacteria such as Mycobacterium tuberculosis can be predicted by an application of machine learning models to next-generation sequencing data. The received wisdom is that following standard protocols for training commonly used machine learning models should produce accurate drug resistance predictions.In this paper, we propose an important caveat to this idea. Specifically, we show that considering geographical diversity is critical for making accurate predictions, and that different geographic regions may have disparate drug resistance mechanisms that are predominant. By comparing the results within and across a regional dataset and two international datasets, we show that model performance may vary dramatically between settings.In addition, we propose a new method for extracting the most important variants responsible for predicting resistance to each first-line drug, and show that it is to recapitulate a large amount of what is known about the biology of drug resistance in Mycobacterium tuberculosis.


Sign in / Sign up

Export Citation Format

Share Document