The prediction of molecule atomization energy using neural network and extreme gradient boosting

Abstract Machine Learning is an artificial intelligence system, where the system has the ability to learn automatically from experience without being explicitly programmed. The learning process from Machine Learning starts from observing the data and then looking at the pattern of the data. The main purpose of this process is to make computers learn automatically. In this study, we will use Machine Learning to predict molecular atomization energy. From various methods in Machine Learning, we use two methods namely Neural Network and Extreme Gradient Boosting. Both methods have several parameters that must be adjusted so that the predicted value of the atomization energy of the molecule has the lowest possible error. We are trying to find the right parameter values for both methods. For the neural network method, it is quite difficult to find the right parameter value because it takes a long time to train the model of the neural network to find out whether the model is good or bad, while for the Extreme Gradient Boosting method the time needed to train the model is shorter, so it is quite easy to find the right parameter values for the model. This study also looked at the effects of the modification on the dataset with the output transformation of normalization and standardization then removing molecules containing Br atoms and changing the entry in the Coulomb matrix to 0 if the distance between atoms in the molecule exceeds 2 angstrom.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Prediction of Electropulse-Induced Nonlinear Temperature Variation of Mg Alloy Based on Machine Learning

Korean Journal of Metals and Materials ◽

10.3365/kjmm.2020.58.6.413 ◽

2020 ◽

Vol 58 (6) ◽

pp. 413-422

Author(s):

Jinyeong Yu ◽

Myoungjae Lee ◽

Young Hoon Moon ◽

Yoojeong Noh ◽

Taekyung Lee

Keyword(s):

Neural Network ◽

Machine Learning ◽

Temperature Variation ◽

High Energy ◽

Mg Alloy ◽

Model Complexity ◽

Gradient Boosting ◽

Learning Technology ◽

Extreme Gradient Boosting ◽

Nonlinear Temperature

Electropulse-induced heating has attracted attention due to its high energy efficiency. However, the process gives rise to a nonlinear temperature variation, which is difficult to predict using a traditional physics model. As an alternative, this study employed machine-learning technology to predict such temperature variation for the first time. Mg alloy was exposed to a single electropulse with a variety of pulse magnitudes and durations for this purpose. Nine machine-learning models were established from algorithms from artificial neural network (ANN), deep neural network (DNN), and extreme gradient boosting (XGBoost). The ANN models showed an insufficient predicting capability with respect to the region of peak temperature, where temperature varied most significantly. The DNN models were built by increasing model complexity, enhancing architectures, and tuning hyperparameters. They exhibited a remarkable improvement in predicting capability at the heating-cooling boundary as well as overall estimation. As a result, the DNN-2 model in this group showed the best prediction of nonlinear temperature variation among the machinelearning models built in this study. The XGBoost model exhibited poor predicting performance when default hyperparameters were applied. However, hyperparameter tuning of learning rates and maximum depths resulted in a decent predicting capability with this algorithm. Furthermore, XGBoost models exhibited an extreme reduction in learning time compared with the ANN and DNN models. This advantage is expected to be useful for predicting more complicated cases including various materials, multi-step electropulses, and electrically-assisted forming.

Download Full-text

Application of Machine Learning to Interpret Steady State Drainage Relative Permeability Experiments

10.2118/207877-ms ◽

2021 ◽

Author(s):

Eric Sonny Mathew ◽

Moussa Tembely ◽

Waleed AlAmeri ◽

Emad W. Al-Shalabi ◽

Abdul Ravoof Shaik

Keyword(s):

Neural Network ◽

Machine Learning ◽

Experimental Data ◽

Steady State ◽

Relative Permeability ◽

Learning Model ◽

Gradient Boosting ◽

Data Set ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract A meticulous interpretation of steady-state or unsteady-state relative permeability (Kr) experimental data is required to determine a complete set of Kr curves. In this work, three different machine learning models was developed to assist in a faster estimation of these curves from steady-state drainage coreflooding experimental runs. The three different models that were tested and compared were extreme gradient boosting (XGB), deep neural network (DNN) and recurrent neural network (RNN) algorithms. Based on existing mathematical models, a leading edge framework was developed where a large database of Kr and Pc curves were generated. This database was used to perform thousands of coreflood simulation runs representing oil-water drainage steady-state experiments. The results obtained from these simulation runs, mainly pressure drop along with other conventional core analysis data, were utilized to estimate Kr curves based on Darcy's law. These analytically estimated Kr curves along with the previously generated Pc curves were fed as features into the machine learning model. The entire data set was split into 80% for training and 20% for testing. K-fold cross validation technique was applied to increase the model accuracy by splitting the 80% of the training data into 10 folds. In this manner, for each of the 10 experiments, 9 folds were used for training and the remaining one was used for model validation. Once the model is trained and validated, it was subjected to blind testing on the remaining 20% of the data set. The machine learning model learns to capture fluid flow behavior inside the core from the training dataset. The trained/tested model was thereby employed to estimate Kr curves based on available experimental results. The performance of the developed model was assessed using the values of the coefficient of determination (R2) along with the loss calculated during training/validation of the model. The respective cross plots along with comparisons of ground-truth versus AI predicted curves indicate that the model is capable of making accurate predictions with error percentage between 0.2 and 0.6% on history matching experimental data for all the three tested ML techniques (XGB, DNN, and RNN). This implies that the AI-based model exhibits better efficiency and reliability in determining Kr curves when compared to conventional methods. The results also include a comparison between classical machine learning approaches, shallow and deep neural networks in terms of accuracy in predicting the final Kr curves. The various models discussed in this research work currently focusses on the prediction of Kr curves for drainage steady-state experiments; however, the work can be extended to capture the imbibition cycle as well.

Download Full-text

Gradient boosting for the prediction of gas chromatographic retention indices

Сорбционные и хроматографические процессы ◽

10.17308/sorpchrom.2019.19/2223 ◽

2019 ◽

Vol 19 (6) ◽

pp. 630-635

Author(s):

Dmitriy D. Matyushin ◽

Anastasia Yu. Sholokhova ◽

Aleksey K. Buryak

Keyword(s):

Neural Network ◽

Machine Learning ◽

Retention Indices ◽

Gradient Boosting ◽

Chromatographic Retention ◽

Relative Deviation ◽

The Neural Network ◽

Gas Chromatographic Retention Indices ◽

Chromatographic Retention Indices ◽

Hidden Layer

The estimation of gas chromatographic retention indices based on compounds structures is an importantproblem. Predicted retention indices can be used in a mass spectral library search for the identificationof unknowns. Various machine learning methods are used for this task, but methods based on decisiontrees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability ofthis method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors are used for the neural network and gradient boosting. The model based on gradient boosting outperforms the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.The performance of this model is comparable or better than performance of other modern retention prediction models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST 17. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained with different machine learning algorithms and with the same representation of the molecule strongly correlate with each other.

Download Full-text

Rainfall Prediction using Machine Learning and Deep Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d6611.1110421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 251-254

Author(s):

B.Meena Preethi ◽

◽

R. Gowtham ◽

S. Aishvarya ◽

S. Karthick ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Missing Values ◽

Binary Classification ◽

Learning Algorithms ◽

Gradient Boosting ◽

Rainfall Prediction ◽

The Neural Network

The project entitled as “Rainfall Prediction using Machine Learning & Deep Learning Algorithms” is a research project which is developed in Python Language and dataset is stored in Microsoft Excel. This prediction uses various machine learning and deep learning algorithms to find which algorithm predicts with most accurately. Rainfall prediction can be achieved by using binary classification under Data Mining. Predicting the rainfall is very important in several aspects of one’s country and can help from preventing serious natural disasters. For this prediction, Artificial Neural Network using Forward and Backward Propagation, Ada Boost, Gradient Boosting and XGBoost algorithms are used in this model for predicting the rainfall. There are totally five modules used in this project. The Data Analysis Module will analyse the datasets and finding the missing values in the dataset. The Data Pre-processing includes Data Cleaning which is the process of filling the missing values in the dataset. The Feature Transformation Module is used to modify the features of the dataset. The Data Mining Module is used to train the dataset to models using any algorithm for learning the pattern. The Model Evaluation Module is used to measure the performance of the model and finalize the overall best accuracy for the prediction. Dataset used in this prediction is for the country Australia. This main aim of the project is to compare the various boosting algorithms with the neural network and find the best algorithm among them. This prediction can be major advantage to the farmers in order to plant the types of crops according to the needy of water. Overall, we analyse the algorithm which is feasible for qualitatively predicting the rainfall.

Download Full-text

Application of Artificial Intelligence and Machine Learning Techniques in Classifying Extent of Dementia Across Alzheimer's Image Data

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2021040103 ◽

2021 ◽

Vol 6 (2) ◽

pp. 29-46

Author(s):

Robin Ghosh ◽

Anirudh Reddy Cingreddy ◽

Venkata Melapu ◽

Sravanthi Joginipelli ◽

Supratik Kar

Keyword(s):

Neural Network ◽

Machine Learning ◽

Nearest Neighbor ◽

Image Data ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Mild Dementia ◽

Extreme Gradient Boosting

Alzheimer's disease (AD) is one of the most common forms of dementia and the sixth-leading cause of death in older adults. The presented study has illustrated the applications of deep learning (DL) and associated methods, which could have a broader impact on identifying dementia stages and may guide therapy in the future for multiclass image detection. The studied datasets contain around 6,400 magnetic resonance imaging (MRI) images, each segregated into the severity of Alzheimer's classes: mild dementia, very mild dementia, non-dementia, moderate dementia. These four image specifications were used to classify the dementia stages in each patient applying the convolutional neural network (CNN) algorithm. Employing the CNN-based in silico model, the authors successfully classified and predicted the different AD stages and got around 97.19% accuracy. Again, machine learning (ML) techniques like extreme gradient boosting (XGB), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural network (ANN) offered accuracy of 96.62%, 96.56%, 94.62, and 89.88%, respectively.

Download Full-text

Application of long short-term memory neural network technique for predicting monthly pan evaporation

Scientific Reports ◽

10.1038/s41598-021-99999-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mustafa Abed ◽

Monzur Alam Imteaz ◽

Ali Najah Ahmed ◽

Yuk Feng Huang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Short Term Memory ◽

Maximum Temperature ◽

Gradient Boosting ◽

Short Term ◽

Term Memory ◽

Statistical Measures ◽

Extreme Gradient Boosting ◽

Long Short Term Memory

AbstractEvaporation is a key element for water resource management, hydrological modelling, and irrigation system designing. Monthly evaporation (Ep) was projected by deploying three machine learning (ML) models included Extreme Gradient Boosting, ElasticNet Linear Regression, and Long Short-Term Memory; and two empirical techniques namely Stephens-Stewart and Thornthwaite. The aim of this study is to develop a reliable generalised model to predict evaporation throughout Malaysia. In this context, monthly meteorological statistics from two weather stations in Malaysia were utilised for training and testing the models on the basis of climatic aspects such as maximum temperature, mean temperature, minimum temperature, wind speed, relative humidity, and solar radiation for the period of 2000–2019. For every approach, multiple models were formulated by utilising various combinations of input parameters and other model factors. The performance of models was assessed by utilising standard statistical measures. The outcomes indicated that the three machine learning models formulated outclassed empirical models and could considerably enhance the precision of monthly Ep estimate even with the same combinations of inputs. In addition, the performance assessment showed that Long Short-Term Memory Neural Network (LSTM) offered the most precise monthly Ep estimations from all the studied models for both stations. The LSTM-10 model performance measures were (R2 = 0.970, MAE = 0.135, MSE = 0.027, RMSE = 0.166, RAE = 0.173, RSE = 0.029) for Alor Setar and (R2 = 0.986, MAE = 0.058, MSE = 0.005, RMSE = 0.074, RAE = 0.120, RSE = 0.013) for Kota Bharu.

Download Full-text

Debris Flow Susceptibility Mapping Using Machine-Learning Techniques in Shigatse Area, China

Remote Sensing ◽

10.3390/rs11232801 ◽

2019 ◽

Vol 11 (23) ◽

pp. 2801 ◽

Cited By ~ 11

Author(s):

Yonghong Zhang ◽

Taotao Ge ◽

Wei Tian ◽

Yuei-An Liou

Keyword(s):

Neural Network ◽

Machine Learning ◽

Debris Flow ◽

Debris Flows ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Triggering Factors ◽

Extreme Gradient Boosting ◽

Debris Flow Susceptibility

Debris flows have been always a serious problem in the mountain areas. Research on the assessment of debris flows susceptibility (DFS) is useful for preventing and mitigating debris flow risks. The main purpose of this work is to study the DFS in the Shigatse area of Tibet, by using machine learning methods, after assessing the main triggering factors of debris flows. Remote sensing and geographic information system (GIS) are used to obtain datasets of topography, vegetation, human activities and soil factors for local debris flows. The problem of debris flow susceptibility level imbalances in datasets is addressed by the Borderline-SMOTE method. Five machine learning methods, i.e., back propagation neural network (BPNN), one-dimensional convolutional neural network (1D-CNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) have been used to analyze and fit the relationship between debris flow triggering factors and occurrence, and to evaluate the weight of each triggering factor. The ANOVA and Tukey HSD tests have revealed that the XGBoost model exhibited the best mean accuracy (0.924) on ten-fold cross-validation and the performance was significantly better than that of the BPNN (0.871), DT (0.816), and RF (0.901). However, the performance of the XGBoost did not significantly differ from that of the 1D-CNN (0.914). This is also the first comparison experiment between XGBoost and 1D-CNN methods in the DFS study. The DFS maps have been verified by five evaluation methods: Precision, Recall, F1 score, Accuracy and area under the curve (AUC). Experiments show that the XGBoost has the best score, and the factors that have a greater impact on debris flows are aspect, annual average rainfall, profile curvature, and elevation.

Download Full-text

Research on Accurate Prediction of the Container Ship Resistance by RBFNN and Other Machine Learning Algorithms

Journal of Marine Science and Engineering ◽

10.3390/jmse9040376 ◽

2021 ◽

Vol 9 (4) ◽

pp. 376 ◽

Cited By ~ 1

Author(s):

Yunfei Yang ◽

Haiwen Tu ◽

Lei Song ◽

Lin Chen ◽

De Xie ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Prediction Method ◽

Resistance Coefficient ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Container Ship ◽

Extreme Gradient Boosting ◽

Better Than

Resistance is one of the important performance indicators of ships. In this paper, a prediction method based on the Radial Basis Function neural network (RBFNN) is proposed to predict the resistance of a 13500 transmission extension unit (13500TEU) container ship at different drafts. The predicted draft state in the known range is called interpolation prediction; otherwise, it is extrapolation prediction. First, ship features are extracted to make the resistance Rt prediction. The resistance prediction results show that the performance of the RBFNN is significantly better than the other four machine learning models, backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). Then, the ship data is processed in a dimensionless manner, and the models mentioned above are used to predict the total resistance coefficient Ct of the container ship. The prediction results show that the RBFNN prediction model still performs well. Good results can be obtained by RBFNN in interpolation prediction, even when using part of dimensionless features. Finally, the accuracy of the prediction method based on RBFNN is greatly improved compared with the modified admiralty coefficient.

Download Full-text