Machine Learning Reactivity in the Chemical Space Surrounding Vaska's Complex

Author(s):  
Pascal Friederich ◽  
Gabriel dos Passos Gomes ◽  
Riccardo De Bin ◽  
Alan Aspuru-Guzik ◽  
David Balcells

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.

2019 ◽  
Author(s):  
Pascal Friederich ◽  
Gabriel dos Passos Gomes ◽  
Riccardo De Bin ◽  
Alan Aspuru-Guzik ◽  
David Balcells

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Van-Hai Nguyen ◽  
Tien-Thinh Le ◽  
Hoanh-Son Truong ◽  
Minh Vuong Le ◽  
Van-Luc Ngo ◽  
...  

This paper deals with the prediction of surface roughness in manufacturing polycarbonate (PC) by applying Bayesian optimization for machine learning models. The input variables of ultraprecision turning—namely, feed rate, depth of cut, spindle speed, and vibration of the X-, Y-, and Z-axis—are the main factors affecting surface quality. In this research, six machine learning- (ML-) based models—artificial neural network (ANN), Cat Boost Regression (CAT), Support Vector Machine (SVR), Gradient Boosting Regression (GBR), Decision Tree Regression (DTR), and Extreme Gradient Boosting Regression (XGB)—were applied to predict the surface roughness (Ra). The predictive performance of the baseline models was quantitatively assessed through error metrics: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The overall results indicate that the XGB and CAT models predict Ra with the greatest accuracy. In improving baseline models such as XGB and CAT, the Bayesian optimization (BO) is next used to determine their best hyperparameters, and the results indicate that XGB is the best model according to the evaluation metrics. Results have shown that the performance of the models has been improved significantly with BO. For example, the values of RMSE and MAE of XGB have decreased from 0.0076 to 0.0047 and from 0.0063 to 0.0027, respectively, for the training dataset. Using the testing dataset, the values of RMSE and MAE of XGB have decreased from 0.4033 to 0.2512 and from 0.2845 to 0.2225, respectively. Moreover, the vibrations of the X, Y, and Z axes and feed rate are the most significant feature in predicting the results, which is in high accordance with the literature. We find that, in a specified value domain, the vibration of the axes has a greater influence on the surface quality than does the cutting condition.


Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 579 ◽  
Author(s):  
Baosu Guo ◽  
Jingwen Hu ◽  
Wenwen Wu ◽  
Qingjin Peng ◽  
Fenghe Wu

Machine learning algorithms have been widely used to deal with a variety of practical problems such as computer vision and speech processing. But the performance of machine learning algorithms is primarily affected by their hyper-parameters, as without good hyper-parameter values the performance of these algorithms will be very poor. Unfortunately, for complex machine learning models like deep neural networks, it is very difficult to determine their hyper-parameters. Therefore, it is of great significance to develop an efficient algorithm for hyper-parameter automatic optimization. In this paper, a novel hyper-parameter optimization methodology is presented to combine the advantages of a Genetic Algorithm and Tabu Search to achieve the efficient search for hyper-parameters of learning algorithms. This method is defined as the Tabu_Genetic Algorithm. In order to verify the performance of the proposed algorithm, two sets of contrast experiments are conducted. The Tabu_Genetic Algorithm and other four methods are simultaneously used to search for good values of hyper-parameters of deep convolutional neural networks. Experimental results show that, compared to Random Search and Bayesian optimization methods, the proposed Tabu_Genetic Algorithm finds a better model in less time. Whether in a low-dimensional or high-dimensional space, the Tabu_Genetic Algorithm has better search capabilities as an effective method for finding the hyper-parameters of learning algorithms. The presented method in this paper provides a new solution for solving the hyper-parameters optimization problem of complex machine learning models, which will provide machine learning algorithms with better performance when solving practical problems.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 7834
Author(s):  
Christopher Hecht ◽  
Jan Figgener ◽  
Dirk Uwe Sauer

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.


2021 ◽  
Vol 11 (19) ◽  
pp. 9296
Author(s):  
Talha Mahboob Alam ◽  
Mubbashar Mushtaq ◽  
Kamran Shaukat ◽  
Ibrahim A. Hameed ◽  
Muhammad Umer Sarwar ◽  
...  

Lack of education is a major concern in underdeveloped countries because it leads to poor human and economic development. The level of education in public institutions varies across all regions around the globe. Current disparities in access to education worldwide are mostly due to systemic regional differences and the distribution of resources. Previous research focused on evaluating students’ academic performance, but less has been done to measure the performance of educational institutions. Key performance indicators for the evaluation of institutional performance differ from student performance indicators. There is a dire need to evaluate educational institutions’ performance based on their disparities and academic results on a large scale. This study proposes a model to measure institutional performance based on key performance indicators through data mining techniques. Various feature selection methods were used to extract the key performance indicators. Several machine learning models, namely, J48 decision tree, support vector machines, random forest, rotation forest, and artificial neural networks were employed to build an efficient model. The results of the study were based on different factors, i.e., the number of schools in a specific region, teachers, school locations, enrolment, and availability of necessary facilities that contribute to school performance. It was also observed that urban regions performed well compared to rural regions due to the improved availability of educational facilities and resources. The results showed that artificial neural networks outperformed other models and achieved an accuracy of 82.9% when the relief-F based feature selection method was used. This study will help support efforts in governance for performance monitoring, policy formulation, target-setting, evaluation, and reform to address the issues and challenges in education worldwide.


2022 ◽  
Vol 14 (1) ◽  
pp. 229
Author(s):  
Jiarui Shi ◽  
Qian Shen ◽  
Yue Yao ◽  
Junsheng Li ◽  
Fu Chen ◽  
...  

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.


Author(s):  
Maicon Herverton Lino Ferreira da Silva Barros ◽  
Geovanne Oliveira Alves ◽  
Lubnnia Morais Florêncio Souza ◽  
Élisson da Silva Rocha ◽  
João Fausto Lorenzato de Oliveira ◽  
...  

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.


2021 ◽  
Author(s):  
Victor Fung ◽  
Jiaxin Zhang ◽  
Eric Juarez ◽  
Bobby Sumpter

Graph neural networks (GNNs) have received intense interest as a rapidly expanding class of machine learning models remarkably well-suited for materials applications. To date, a number of successful GNNs have been proposed and demonstrated for systems ranging from crystal stability to electronic property prediction and to surface chemistry and heterogeneous catalysis. However, a consistent benchmark of these models remains lacking, hindering the development and consistent evaluation of new models in the materials field. Here, we present a workflow and testing platform, MatDeepLearn, for quickly and reproducibly assessing and comparing GNNs and other machine learning models. We use this platform to optimize and evaluate a selection of top performing GNNs on several representative datasets in computational materials chemistry. From our investigations we note the importance of hyperparameter selection and find roughly similar performances for the top models once optimized. We identify several strengths in GNNs over conventional models in cases with compositionally diverse datasets and in its overall flexibility with respect to inputs, due to learned rather than defined representations. Meanwhile several weaknesses of GNNs are also observed including high data requirements, and suggestions for further improvement for applications in materials chemistry are proposed.


Sign in / Sign up

Export Citation Format

Share Document