scholarly journals Machine Learning Tools For off-Target Early Safety Assessment of Small Molecules In Drug Discovery (Single Task Neural Networks Vs Automated Machine Learning)

Author(s):  
Doha Naga ◽  
Wolfgang Muster ◽  
Eunice Musvasva ◽  
Gerhard F. Ecker

Abstract Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies[1-3]. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies including Roche, routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies.Hereby we present a machine learning framework aiming at the prediction of our in-house 50 off-target panel[4] activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and accelerate drug discovery. It incorporates different ML approaches such as deep learning and automated machine learning. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.

Author(s):  
Ke Wang ◽  
Qingwen Xue ◽  
Jian John Lu

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.


2018 ◽  
Author(s):  
soumya banerjee

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.


2021 ◽  
Author(s):  
Sian Xiao ◽  
Hao Tian ◽  
Peng Tao

Allostery is a fundamental process in regulating proteins’ activity. The discovery, design and development of allosteric drugs demand for better identification of allosteric sites. Several computational methods have been developed previously to predict allosteric sites using static pocket features and protein dynamics. Here, we present a computational model using automated machine learning for allosteric site prediction. Our model, PASSer2.0, advanced the previous results and performed well across multiple indicators with 89.2% of allosteric pockets appeared among the top 3 positions. The trained machine learning model has been integrated with the Protein Allosteric Sites Server (https://passer.smu.edu) to facilitate allosteric drug discovery.


2020 ◽  
Author(s):  
Edwin Tse ◽  
Laksh Aithani ◽  
Mark Anderson ◽  
Jonathan Cardoso-Silva ◽  
Giovanni Cincilla ◽  
...  

<p>The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against <i>Plasmodium falciparum</i> <i>in vitro</i> and <i>in vivo</i> and that have been suggested to act through the inhibition of <i>Pf</i>ATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na<sup>+</sup> concentration. The structure of <i>Pf</i>ATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-<i>Pf</i>ATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.</p>In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 168053-168060 ◽  
Author(s):  
Pouya Soltani Zarrin ◽  
Niels Roeckendorf ◽  
Christian Wenger

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Pikee Priya ◽  
N. R. Aluru

AbstractWe use machine learning tools for the design and discovery of ABO3-type perovskite oxides for various energy applications, using over 7000 data points from the literature. We demonstrate a robust learning framework for efficient and accurate prediction of total conductivity of perovskites and their classification based on the type of charge carrier at different conditions of temperature and environment. After evaluating a set of >100 features, we identify average ionic radius, minimum electronegativity, minimum atomic mass, minimum formation energy of oxides for all B-site, and B-site dopant ions of the perovskite as the crucial and relevant predictors for determining conductivity and the type of charge carriers. The models are validated by predicting the conductivity of compounds absent in the training set. We screen 1793 undoped and 95,832 A-site and B-site doped perovskites to report the perovskites with high conductivities, which can be used for different energy applications, depending on the type of the charge carriers.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Tiago J. S. Lopes ◽  
Ricardo Rios ◽  
Tatiane Nogueira ◽  
Rodrigo F. Mello

AbstractHemophilia A is a relatively rare hereditary coagulation disorder caused by a defective F8 gene resulting in a dysfunctional Factor VIII protein (FVIII). This condition impairs the coagulation cascade, and if left untreated, it causes permanent joint damage and poses a risk of fatal intracranial hemorrhage in case of traumatic events. To develop prophylactic therapies with longer half-lives and that do not trigger the development of inhibitory antibodies, it is essential to have a deep understanding of the structure of the FVIII protein. In this study, we explored alternative ways of representing the FVIII protein structure and designed a machine-learning framework to improve the understanding of the relationship between the protein structure and the disease severity. We verified a close agreement between in silico, in vitro and clinical data. Finally, we predicted the severity of all possible mutations in the FVIII structure – including those not yet reported in the medical literature. We identified several hotspots in the FVIII structure where mutations are likely to induce detrimental effects to its activity. The combination of protein structure analysis and machine learning is a powerful approach to predict and understand the effects of mutations on the disease outcome.


Sign in / Sign up

Export Citation Format

Share Document