Predicting biochemical and physiological effects of natural products from molecular structures using machine learning

2020 ◽

Vol 9 (3) ◽

pp. 2021-2027

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Biological Activities ◽

Biological Effects ◽

Recursive Feature Elimination ◽

Drug Candidate ◽

Learning Models ◽

Machine Learning Models ◽

Non Parametric

In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157.v2 ◽

2018 ◽

Cited By ~ 1

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

Direct Prediction of Bioaccumulation of Organic Contaminants in Plant Roots from Soils with Machine Learning Models Based on Molecular Structures

Environmental Science & Technology ◽

10.1021/acs.est.1c02376 ◽

2021 ◽

Author(s):

Feng Gao ◽

Yike Shen ◽

Jonathan Brett Sallach ◽

Hui Li ◽

Cun Liu ◽

...

Keyword(s):

Machine Learning ◽

Organic Contaminants ◽

Molecular Structures ◽

Plant Roots ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Implicitly perturbed Hamiltonian: a class of versatile and general-purpose molecular representations for machine leaning

10.33774/chemrxiv-2021-6kqbw ◽

2021 ◽

Author(s):

Amin Alibakhshi ◽

Bernd Hartke

Keyword(s):

Machine Learning ◽

General Purpose ◽

Molecular Structures ◽

Learning Models ◽

Molecular Systems ◽

Molecular Features ◽

Scientific Disciplines ◽

Machine Leaning ◽

Solvation Free Energies ◽

Machine Learning Models

Unraveling challenging problems by machine learning has recently become a hot topic in many scientific disciplines. For developing rigorous machine-learning models to study problems of interest in molecular sciences, translating molecular structures to quantitative representations as suitable machine-learning inputs plays a central role. Many different molecular representations and the state-ofthe- art ones, although efficient in studying numerous molecular features, still are sub-optimal in many challenging cases, as discussed in the context of present research. The main aim of the present study is to introduce the Implicitly Perturbed Hamiltonian (ImPerHam) as a class of versatile representations for more efficient machine learning of challenging problems in molecular sciences. ImPerHam representations are defined as energy attributes of the molecular Hamiltonian, implicitly perturbed by a number of hypothetic or real arbitrary solvents based on continuum solvation models. We demonstrate outstanding performance of machine-learning models based on ImPerHam representations for three diverse and challenging cases of predicting inhibition of the CYP450 enzyme, high precision and transferrable evaluation of conformational energy of molecular systems and accurately reproducing solvation free energies for large benchmark sets.

Download Full-text

Multitask machine learning models for predicting lipophilicity (logP) in the SAMPL7 challenge

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-021-00405-6 ◽

2021 ◽

Author(s):

Eelke B. Lenselink ◽

Pieter F. W. Stouten

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Drug Discovery ◽

Message Passing ◽

Learning Model ◽

Molecular Structures ◽

Learning Models ◽

Final Model ◽

Machine Learning Model ◽

Machine Learning Models

AbstractAccurate prediction of lipophilicity—logP—based on molecular structures is a well-established field. Predictions of logP are often used to drive forward drug discovery projects. Driven by the SAMPL7 challenge, in this manuscript we describe the steps that were taken to construct a novel machine learning model that can predict and generalize well. This model is based on the recently described Directed-Message Passing Neural Networks (D-MPNNs). Further enhancements included: both the inclusion of additional datasets from ChEMBL (RMSE improvement of 0.03), and the addition of helper tasks (RMSE improvement of 0.04). To the best of our knowledge, the concept of adding predictions from other models (Simulations Plus logP and [email protected], respectively) as helper tasks is novel and could be applied in a broader context. The final model that we constructed and used to participate in the challenge ranked 2/17 ranked submissions with an RMSE of 0.66, and an MAE of 0.48 (submission: Chemprop). On other datasets the model also works well, especially retrospectively applied to the SAMPL6 challenge where it would have ranked number one out of all submissions (RMSE of 0.35). Despite the fact that our model works well, we conclude with suggestions that are expected to improve the model even further.

Download Full-text

Machine learning models to select potential inhibitors of acetylcholinesterase activity from SistematX: a natural products database

Molecular Diversity ◽

10.1007/s11030-021-10245-z ◽

2021 ◽

Author(s):

Chonny Herrera-Acevedo ◽

Camilo Perdomo-Madrigal ◽

Kenyi Herrera-Acevedo ◽

Ericsson Coy-Barrera ◽

Luciana Scotti ◽

...

Keyword(s):

Machine Learning ◽

Natural Products ◽

Acetylcholinesterase Activity ◽

Learning Models ◽

Potential Inhibitors ◽

Machine Learning Models

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157.v1 ◽

2018 ◽

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

10.26434/chemrxiv.5883157 ◽

2018 ◽

Author(s):

Daniel Elton ◽

Zois Boukouvalas ◽

Mark S. Butrico ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Machine Learning ◽

Energetic Materials ◽

Molecular Structures ◽

Machine Learning Techniques ◽

Small Data ◽

Detonation Pressure ◽

Learning Models ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Models

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

Download Full-text

Direct prediction of bioaccumulation of organic contaminants in plant roots from soils with machine learning models based on molecular structures

10.21203/rs.3.rs-240794/v1 ◽

2021 ◽

Author(s):

Feng Gao ◽

Yike Shen ◽

J. Brett Sallach ◽

Hui Li ◽

Cun Liu ◽

...

Keyword(s):

Machine Learning ◽

Organic Contaminants ◽

Plant Root ◽

Ecological Impacts ◽

Molecular Structures ◽

Gradient Boosting ◽

Learning Models ◽

Root Interactions ◽

Chemical Soil ◽

Machine Learning Models

Abstract Root concentration factor is an important substance-specific characterization parameter for plant uptake of organic contaminants from soils in life cycle impact assessment (LCIA); however, the availability of a reliable dataset and building of robust predictive models remain challenging due to the complexity of chemical-soil-plant root interactions. Here we developed end-to-end machine learning models to devolve the interaction complexity by training on a unified dataset with 341 data points covering 72 chemicals. The gradient boosting regression tree (GBRT) model based on the extended connectivity fingerprints (ECFP) demonstrated a superior prediction performance with R-squared of 0.77 and Mean Absolute Error (MAE) of 0.22. In addition, partial dependence analysis was used to determine the nonlinear relationships in the chemical-soil-plant root system. Feature importance analysis revealed the relationship between and chemical topological structures. Stemming from its simplicity and universality, the GBRT-ECFP model provides a promising tool for LCIA to better characterize the human and ecological impacts of chemicals in the environment.

Download Full-text

Implicitly perturbed Hamiltonian: a class of versatile and general-purpose molecular representations for machine learning

10.21203/rs.3.rs-961540/v1 ◽

2021 ◽

Author(s):

Amin Alibakhshi ◽

Bernd Hartke

Keyword(s):

Machine Learning ◽

General Purpose ◽

Molecular Structures ◽

Learning Models ◽

Molecular Systems ◽

Molecular Features ◽

Scientific Disciplines ◽

Solvation Models ◽

Solvation Free Energies ◽

Machine Learning Models

Abstract Unraveling challenging problems by machine learning has recently become a hot topic in many scientific disciplines. For developing rigorous machine-learning models to study problems of interest in molecular sciences, translating molecular structures to quantitative representations as suitable machine-learning inputs play a central role. Many different molecular representations and the state-of-the-art ones, although efficient in studying numerous molecular features, still are suboptimal in many challenging cases, as discussed in the context of the present research. The main aim of the present study is to introduce the Implicitly Perturbed Hamiltonian (ImPerHam) as a class of versatile representations for more efficient machine learning of challenging problems in molecular sciences. ImPerHam representations are defined as energy attributes of the molecular Hamiltonian, implicitly perturbed by a number of hypothetic or real arbitrary solvents based on continuum solvation models. We demonstrate the outstanding performance of machine-learning models based on ImPerHam representations for three diverse and challenging cases of predicting inhibition of the CYP450 enzyme, high precision, and transferrable evaluation of conformational energy of molecular systems, and accurately reproducing solvation free energies for large benchmark sets.

Download Full-text

Predicting biochemical and physiological effects of natural products from molecular structures using machine learning

Multivariate Classification of Drugs using Parametric and Nonparametric Machine Learning Models

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

Direct Prediction of Bioaccumulation of Organic Contaminants in Plant Roots from Soils with Machine Learning Models Based on Molecular Structures

Implicitly perturbed Hamiltonian: a class of versatile and general-purpose molecular representations for machine leaning

Multitask machine learning models for predicting lipophilicity (logP) in the SAMPL7 challenge

Machine learning models to select potential inhibitors of acetylcholinesterase activity from SistematX: a natural products database

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

Direct prediction of bioaccumulation of organic contaminants in plant roots from soils with machine learning models based on molecular structures

Implicitly perturbed Hamiltonian: a class of versatile and general-purpose molecular representations for machine learning

Export Citation Format