Predicting Ruthenium Catalysed Hydrogenation of Esters using Machine Learning

Catalytic hydrogenation of esters is a sustainable approach for the production of fine chemicals, and pharmaceutical drugs. However, the efficiency and cost of catalysts are often the bottlenecks in the commercialization of such technologies. The conventional approach of catalyst discovery is based on empiricism that makes the discovery process time-consuming and expensive. There is an urgent need to develop effective approaches to discover efficient catalysts for hydrogenation reactions. We demonstrate here the approach of machine learning for the prediction of out-comes for the catalytic hydrogenation of esters. Our models can predict the reaction yields with high mean accuracies of up to 91% (test set) and suggest that the use of certain chemical descriptors selectively can result in a more accurate model. Furthermore, cata-lysts and some of their corresponding descriptors can also be pre-dicted with mean accuracies of 85%, and >90%, respectively.

Download Full-text

Catalytic asymmetric hydrogenation reaction by in situ formed ultra-fine metal nanoparticles in live thermophilic hydrogen-producing bacteria

Nanoscale ◽

10.1039/d1nr00327e ◽

2021 ◽

Author(s):

Wei Bing ◽

Faming Wang ◽

Yuhuan Sun ◽

Jinsong Ren ◽

Xiaogang Qu

Keyword(s):

Metal Nanoparticles ◽

Catalytic Hydrogenation ◽

Asymmetric Hydrogenation ◽

Environmentally Friendly ◽

Hydrogenation Reaction ◽

Highly Efficient ◽

Hydrogenation Reactions ◽

Live Bacteria ◽

Catalytic Asymmetric Hydrogenation

An environmentally friendly biomimetic strategy has been presented and validated for the catalytic hydrogenation reaction in live bacteria. In situ formed ultra-fine metal nanoparticles can realize highly efficient asymmetric hydrogenation reactions.

Download Full-text

Structural, QSAR, machine learning and molecular docking studies of 5-thiophen-2-yl pyrazole derivatives as potent and selective cannabinoid-1 receptor antagonists

New Journal of Chemistry ◽

10.1039/d1nj02261j ◽

2021 ◽

Author(s):

Riad Hanachi ◽

Ridha Ben Said ◽

Hamza Allal ◽

Seyfeddine Rahali ◽

Mohammed A. M. Alkhalifah ◽

...

Keyword(s):

Machine Learning ◽

Biological Activity ◽

Molecular Docking ◽

Theoretical Analysis ◽

Structural Study ◽

Docking Studies ◽

Pyrazole Derivatives ◽

Molecular Docking Studies ◽

Cannabinoid 1 Receptor ◽

Chemical Descriptors

We performed a structural study followed by a theoretical analysis of the chemical descriptors and the biological activity of a series of 5-thiophen-2-yl pyrazole derivatives as potent and selective Cannabinoid-1...

Download Full-text

Exploring sex-specific patterns of mortality predictors among patients undergoing cardiac resynchronization therapy: a machine learning approach

European Heart Journal ◽

10.1093/ehjci/ehaa946.0996 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

M Tokodi ◽

A Behon ◽

E.D Merkel ◽

A Kovacs ◽

Z Toser ◽

...

Keyword(s):

Machine Learning ◽

Sex Differences ◽

Cardiac Resynchronization Therapy ◽

Cardiac Resynchronization ◽

Funding Source ◽

Resynchronization Therapy ◽

Test Set ◽

All Cause Mortality ◽

Mortality Predictors

Abstract Background The relative importance of variables explaining sex differences in outcomes is scarcely explored in patients undergoing cardiac resynchronization therapy (CRT). Purpose We sought to implement and evaluate machine learning (ML) algorithms for the prediction of 1- and 3-year all-cause mortality in patients undergoing CRT implantation. We also aimed to assess the sex-specific differences and similarities in the predictors of mortality using ML approaches. Methods A retrospective registry of 2191 CRT patients (75% males) was used in the current analysis. ML models were implemented in 6 partially overlapping patient subsets (all patients, females or males with 1- or 3-year follow-up data available). Each cohort was randomly split into a training (80%) and a test set (20%). After hyperparameter tuning with 10-fold cross-validation in the training set, the best performing algorithm was also evaluated in the test set. Model discrimination was quantified using the area under the receiver-operating characteristic curves (AUC) and the associated 95% confidence intervals. The most important predictors were identified using the permutation feature importances method. Results Conditional inference random forest exhibited the best performance with AUCs of 0.728 [0.645–0.802] and 0.732 [0.681–0.784] for the prediction of 1- and 3-year mortality, respectively. Etiology of heart failure, NYHA class, left ventricular ejection fraction and QRS morphology had higher predictive power in females, whereas hemoglobin was less important than in males. The importance of atrial fibrillation and age increased, whereas the relevance of serum creatinine decreased from 1- to 3-year follow-up in both sexes. Conclusions Using advanced ML techniques in combination with easily obtainable clinical features, our models effectively predicted 1- and 3-year all-cause mortality in patients undergoing CRT implantation. The in-depth analysis of features has revealed marked sex differences in mortality predictors. These results support the use of ML-based approaches for the risk stratification of patients undergoing CRT implantation. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): National Research, Development and Innovation Office of Hungary

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

RegioML: Predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning

10.33774/chemrxiv-2021-l2fvl ◽

2021 ◽

Author(s):

Nicolai Ree ◽

Andreas H. Göller ◽

Jan H. Jensen

Keyword(s):

Machine Learning ◽

Tight Binding ◽

Reaction Centers ◽

Gradient Boosting ◽

Electrophilic Aromatic Substitution ◽

Aromatic Substitution ◽

Substitution Reactions ◽

Test Set ◽

Light Gradient ◽

Out Of Sample

We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with the ensemble decision tree variant light gradient boosting machine (LightGBM). The model is trained and tested on 21,201 bromination reactions with 101K reaction centers, which is split into a training, test, and out-of-sample datasets with 58K, 15K, and 27K reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to the graph-based WLN method developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (J. Cheminform. 2021, 13:10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data.

Download Full-text

Selecting Machine-Learning Scoring Functions for Structure-Based Virtual Screening

10.26434/chemrxiv.12967160 ◽

2020 ◽

Author(s):

Pedro Ballester

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Predictive Accuracy ◽

Scoring Function ◽

3D Models ◽

Large Datasets ◽

Scoring Functions ◽

Discovery Process ◽

Drug Discovery Process

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.

Download Full-text

Heterogeneous Catalytic Hydrogenation Reactions in Continuous-Flow Reactors

ChemSusChem ◽

10.1002/cssc.201000354 ◽

2011 ◽

Vol 4 (3) ◽

pp. 300-316 ◽

Cited By ~ 235

Author(s):

Muhammad Irfan ◽

Toma N. Glasnov ◽

C. Oliver Kappe

Keyword(s):

Catalytic Hydrogenation ◽

Continuous Flow ◽

Flow Reactors ◽

Heterogeneous Catalytic ◽

Hydrogenation Reactions ◽

Continuous Flow Reactors ◽

Heterogeneous Catalytic Hydrogenation

Download Full-text

Diagnostic AI Modeling and Pseudo Time Series Profiling of AD and PD Based on Individualized Serum Proteome Data

Frontiers in Bioinformatics ◽

10.3389/fbinf.2021.764497 ◽

2021 ◽

Vol 1 ◽

Author(s):

Jianhu Zhang ◽

Xiuli Zhang ◽

Yuan Sh ◽

Benliang Liu ◽

Zhiyuan Hu

Keyword(s):

Machine Learning ◽

Neurodegenerative Diseases ◽

Early Stage ◽

Group Versus ◽

Blood Protein ◽

Diagnostic Model ◽

Feature Engineering ◽

Ann Model ◽

Test Set ◽

Upstream Regulators

Background: Parkinson’s disease (PD), Alzheimer’s disease (AD) are common neurodegenerative disease, while mild cognitive impairment (MCI) may be happened in the early stage of AD or PD. Blood biomarkers are considered to be less invasive, less cost and more convenient, and there is tremendous potential for the diagnosis and prediction of neurodegenerative diseases. As a recently mentioned field, artificial intelligence (AI) is often applied in biology and shows excellent results. In this article, we use AI to model PD, AD, MCI data and analyze the possible connections between them.Method: Human blood protein microarray profiles including 156 CT, 50 MCI, 132 PD, 50 AD samples are collected from Gene Expression Omnibus (GEO). First, we used bioinformatics methods and feature engineering in machine learning to screen important features, constructed artificial neural network (ANN) classifier models based on these features to distinguish samples, and evaluated the model’s performance with classification accuracy and Area Under Curve (AUC). Second, we used Ingenuity Pathway Analysis (IPA) methods to analyse the pathways and functions in early stage and late stage samples of different diseases, and potential targets for drug intervention by predicting upstream regulators.Result: We used different classifier to construct the model and finally found that ANN model would outperform the traditional machine learning model. In summary, three different classifiers were constructed to be used in different application scenarios, First, we incorporated 6 indicators, including EPHA2, MRPL19, SGK2, to build a diagnostic model for AD with a test set accuracy of up to 98.07%. Secondly, incorporated 15 indicators such as ERO1LB, FAM73B, IL1RN to build a diagnostic model for PD, with a test set accuracy of 97.05%. Then, 15 indicators such as XG, FGFR3 and CDC37 were incorporated to establish a four-category diagnostic model for both AD and PD, with a test set accuracy of 98.71%. All classifier models have an auc value greater than 0.95. Then, we verified that the constructed feature engineering filtered out fewer important features but contained more information, which helped to build a better model. In addition, by classifying the disease types more carefully into early and late stages of AD, MCI, and PD, respectively, we found that early PD may occur earlier than early MCI. Finally, there are 24 proteins that are both differentially expressed proteins and upstream regulators in the disease group versus the normal group, and these proteins may serve as potential therapeutic targets and targets for subsequent studies.Conclusion: The feature engineering we build allows better extraction of information while reducing the number of features, which may help in subsequent applications. Building a classifier based on blood protein profiles using deep learning methods can achieve better classification performance, and it can help us to diagnose the disease early. Overall, it is important for us to study neurodegenerative diseases from both diagnostic and interventional aspects.

Download Full-text

Evaluation of Machine-Learning Tools for Predicting Sand Production

10.2118/207193-ms ◽

2021 ◽

Author(s):

Afungchwi Ronald Ngwashi ◽

David O. Ogbe ◽

Dickson O. Udebhulu

Keyword(s):

Machine Learning ◽

Niger Delta ◽

Oil And Gas ◽

Back Propagation ◽

Oil And Gas Industry ◽

Learning Tools ◽

Sand Production ◽

Data Set ◽

Test Set ◽

Gas Industry

Abstract Data analytics has only recently picked the interest of the oil and gas industry as it has made data visualization much simpler, faster, and cost-effective. This is driven by the promising innovative techniques in developing artificial intelligence and machine-learning tools to provide sustainable solutions to ever-increasing problems of the petroleum industry activities. Sand production is one of these real issues faced by the oil and gas industry. Understanding whether a well will produce sand or not is the foundation of every completion job in sandstone formations. The Niger Delta Province is a region characterized by friable and unconsolidated sandstones, therefore it's more prone to sanding. It is economically unattractive in this region to design sand equipment for a well that will not produce sand. This paper is aimed at developing a fast and more accurate machine-learning algorithm to predict sanding in sandstone formations. A two-layered Artificial Neural Network (ANN) with back-propagation algorithm was developed using PYTHON programming language. The algorithm uses 11 geological and reservoir parameters that are associated with the onset of sanding. These parameters include depth, overburden, pore pressure, maximum and minimum horizontal stresses, well azimuth, well inclination, Poisson's ratio, Young's Modulus, friction angle, and shale content. Data typical of the Niger Delta were collected to validate the algorithm. The data was further split into a training set (70%) and a test set (30%). Statistical analyses of the data yielded correlations between the parameters and were plotted for better visualization. The accuracy of the ANN algorithm is found to depend on the number of parameters, number of epochs, and the size of the data set. For a completion engineer, the answer to the question of whether or not a well will require sand production control is binary-either a well will produce sand or it does not. Support vector machines (SVM) are known to be better suited as the machine-learning tools for binary identification. This study also presents a comparative analysis between ANN and SVM models as tools for predicting sand production. Analysis of the Niger Delta data set indicated that SVM outperformed ANN model even when the training data set is sparse. Using the 30% test set, ANN gives an accuracy, precision, recall, and F1 - Score of about 80% while the SVM performance was 100% for the four metrics. It is then concluded that machine learning tools such as ANN with back-propagation and SVM are simple, accurate, and easy-to-use tools for effectively predicting sand production.

Download Full-text

Single-atom catalysts for thermal- and electro-catalytic hydrogenation reactions

Journal of Materials Chemistry A ◽

10.1039/d1ta07910g ◽

2021 ◽

Author(s):

Jingfang Zhang ◽

Hongjuan Zhang ◽

Yongmeng Wu ◽

Cuibo Liu ◽

Yi Huang ◽

...

Keyword(s):

Economic Growth ◽

Catalytic Hydrogenation ◽

Environmental Sustainability ◽

Chemical Energy ◽

Single Atom ◽

Hydrogenation Reactions ◽

New Generation

Hydrogenation reactions are among the most significant transformations in chemical, energy, and environmental industries, which calls for a new generation of promising catalysts towards economic growth and environmental sustainability. Single-atom...

Download Full-text