scholarly journals Estimation of rainfall erosivity factor in Italy and Switzerland using Bayesian optimization based machine learning models

CATENA ◽  
2022 ◽  
Vol 211 ◽  
pp. 105957
Author(s):  
Seoro Lee ◽  
Joo Hyun Bae ◽  
Jiyeong Hong ◽  
Dongseok Yang ◽  
Panos Panagos ◽  
...  
2021 ◽  
Vol 23 (2) ◽  
pp. 359-370
Author(s):  
Michał Matuszczak ◽  
Mateusz Żbikowski ◽  
Andrzej Teodorczyk

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.


2021 ◽  
Vol 2069 (1) ◽  
pp. 012143
Author(s):  
Sorana Ozaki ◽  
Ryozo Ooka ◽  
Shintaro Ikeda

Abstract The operational energy of buildings is making up one of the highest proportions of life-cycle carbon emissions. A more efficient operation of facilities would result in significant energy savings but necessitates computational models to predict a building’s future energy demands with high precision. To this end, various machine learning models have been proposed in recent years. These models’ prediction accuracies, however, strongly depend on their internal structure and hyperparameters. The time demand and expertise required for their finetuning call for a more efficient solution. In the context of a case study, this paper describes the relationship between a machine learning model’s prediction accuracy and its hyperparameters. Based on time-stamped recordings of outdoor temperatures and electricity demands of a hospital in Japan, recorded every 30 minutes for more than four years, using a deep neural network (DNN) ensemble model, electricity demands were predicted for sixty time steps to follow. Specifically, we used automatic hyperparameter tuning methods, such as grid search, random search, and Bayesian optimization. A single time step ahead, all tuning methods reduced the RSME to less than 50%, compared to non-optimized tuning. The results attest to machine learning models’ reliance on hyperparameters and the effectiveness of their automatic tuning.


2019 ◽  
Author(s):  
Pascal Friederich ◽  
Gabriel dos Passos Gomes ◽  
Riccardo De Bin ◽  
Alan Aspuru-Guzik ◽  
David Balcells

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.


2021 ◽  
Vol 7 ◽  
Author(s):  
Qin-Yu Zhao ◽  
Le-Ping Liu ◽  
Jing-Chao Luo ◽  
Yan-Wei Luo ◽  
Huan Wang ◽  
...  

Background: Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients.Objectives: Our study aimed to develop and validate machine-learning models to dynamically predict the risk of SIC in critically ill patients with sepsis.Methods: Machine-learning models were developed and validated based on two public databases named Medical Information Mart for Intensive Care (MIMIC)-IV and the eICU Collaborative Research Database (eICU-CRD). Dynamic prediction of SIC involved an evaluation of the risk of SIC each day after the diagnosis of sepsis using 15 predictive models. The best model was selected based on its accuracy and area under the receiver operating characteristic curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. A compact model was developed, based on 15 features selected according to their importance and clinical availability. These two models were compared with Logistic Regression and SIC scores in terms of SIC prediction.Results: Of 11,362 patients in MIMIC-IV included in the final cohort, a total of 6,744 (59%) patients developed SIC during sepsis. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869; 95% CI: 0.850–0.886). Coagulation profile and renal function indicators were the most important features for predicting SIC. A compact model was developed with an AUC of 0.854 (95% CI: 0.832–0.872), while the AUCs of Logistic Regression and SIC scores were 0.746 (95% CI: 0.735–0.755) and 0.709 (95% CI: 0.687–0.733), respectively. A cohort of 35,252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in the external validation were 0.842 (95% CI: 0.837–0.846) and 0.803 (95% CI: 0.798–0.809), respectively, which were still larger than those of Logistic Regression (0.660; 95% CI: 0.653–0.667) and SIC scores (0.752; 95% CI: 0.747–0.757). Prediction results were illustrated by SHapley Additive exPlanations (SHAP) values, which made our models clinically interpretable.Conclusions: We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Van-Hai Nguyen ◽  
Tien-Thinh Le ◽  
Hoanh-Son Truong ◽  
Minh Vuong Le ◽  
Van-Luc Ngo ◽  
...  

This paper deals with the prediction of surface roughness in manufacturing polycarbonate (PC) by applying Bayesian optimization for machine learning models. The input variables of ultraprecision turning—namely, feed rate, depth of cut, spindle speed, and vibration of the X-, Y-, and Z-axis—are the main factors affecting surface quality. In this research, six machine learning- (ML-) based models—artificial neural network (ANN), Cat Boost Regression (CAT), Support Vector Machine (SVR), Gradient Boosting Regression (GBR), Decision Tree Regression (DTR), and Extreme Gradient Boosting Regression (XGB)—were applied to predict the surface roughness (Ra). The predictive performance of the baseline models was quantitatively assessed through error metrics: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The overall results indicate that the XGB and CAT models predict Ra with the greatest accuracy. In improving baseline models such as XGB and CAT, the Bayesian optimization (BO) is next used to determine their best hyperparameters, and the results indicate that XGB is the best model according to the evaluation metrics. Results have shown that the performance of the models has been improved significantly with BO. For example, the values of RMSE and MAE of XGB have decreased from 0.0076 to 0.0047 and from 0.0063 to 0.0027, respectively, for the training dataset. Using the testing dataset, the values of RMSE and MAE of XGB have decreased from 0.4033 to 0.2512 and from 0.2845 to 0.2225, respectively. Moreover, the vibrations of the X, Y, and Z axes and feed rate are the most significant feature in predicting the results, which is in high accordance with the literature. We find that, in a specified value domain, the vibration of the axes has a greater influence on the surface quality than does the cutting condition.


2019 ◽  
Author(s):  
Pascal Friederich ◽  
Gabriel dos Passos Gomes ◽  
Riccardo De Bin ◽  
Alan Aspuru-Guzik ◽  
David Balcells

Machine learning models, including neural networks, Bayesian optimization, gradient boosting and Gaussian processes, were trained with DFT data for the accurate, affordable and explainable prediction of hydrogen activation barriers in the chemical space surrounding Vaska's complex.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Wang-Chi Cheung ◽  
Weiwen Zhang ◽  
Yong Liu ◽  
Feng Yang ◽  
Rick-Siow-Mong Goh

Recent studies have revealed the success of data-driven machine health monitoring, which motivates the use of machine learning models in machine health prognostic tasks. While the machine learning approach to health monitoring is gaining importance, the construction of machine learning models is often impeded by the difficulty in choosing the underlying hyper-parameter configuration (HP-config), which governs the construction of the machine learning model. While an effective choice of HP-config can be achieved with human effort, such an effort is often time consuming and requires domain knowledge. In this paper, we consider the use of Bayesian optimization algorithms, which automate an effective choice of HP-config by solving the associated hyperparameter optimization problem. Numerical experiments on the data from PHM 2016 Data Challenge demonstrate the salience of the proposed automatic framework, and exhibit improvement over default HP-configs in standard machine learning packages or chosen by a human agent.


2020 ◽  
Author(s):  
Qin-Yu Zhao ◽  
Le-Ping Liu ◽  
Jing-Chao Luo ◽  
Yan-Wei Luo ◽  
Huan Wang ◽  
...  

Abstract Background Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients. Methods Machine-learning models were developed based on septic patients who were older than 18 years and stayed in intensive care units (ICUs) for more than 24 hours in Medical Information Mart for Intensive Care (MIMIC)-IV. Eighty-eight potential predictors were extracted, and 15 various machine-learning models assessed the daily risk of SIC. The most potent model was selected based on its accuracy and Area Under the receiver operating characteristic Curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. The effects of features on prediction scores were measured using the SHapley Additive exPlanations (SHAP) values. A compact model was developed, based on 15 features selected according to their importance and clinical availability. Two models were compared with Logistic Regression and SIC scores in terms of SIC prediction. Additionally, an external validation was performed in the eICU Collaborative Research Database (eICU-CRD). Results Of 11362 patients in MIMIC-IV included in the final cohort, a total of 6744 (59%) patients had SIC during sepsis, and 16183 samples were extracted. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869 [0.850, 0.886]). Coagulation profile and renal function indicators are the most important features to predict SIC. A compact model was developed with the AUC of 0.854 [0.832, 0.872], while the AUCs of Logistic Regression and SIC scores were 0.746 [0.735, 0.755] and 0.709 [0.687, 0.733], respectively. A cohort of 35252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in external validation were 0.842 [0.837, 0.846] and 0.803 [0.798, 0.809], respectively, which were still larger than those of Logistic Regression (0.660 [0.653, 0.667]) and SIC scores (0.752 [0.747, 0.757]). Prediction results can be illustrated by using SHAP values in the instance level, which makes our models clinically interpretable. Conclusions We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores. Prediction results of our two models can be interpreted by using SHAP values.


2021 ◽  
Author(s):  
Mohammadtaghi Avand ◽  
Maziar Mohammadi ◽  
Fahimeh Mirchooli ◽  
Ataollah Kavian ◽  
John P Tiefenbacher

Abstract Despite advances in artificial intelligence modelling, the lack of soil erosion data and other watershed information is still one of the important factors limiting soil-erosion modelling. Additionally, the limited number of parameters and the lack of evaluation criteria are major disadvantages of empirical soil-erosion models. To overcome these limitations, we introduce a new approach that integrates empirical and artificial intelligence models. Erosion-prone locations (erosion ≥16 tons/ha/year) are identified using RUSLE model and a soil-erosion map is prepared using random forest (RF), artificial neural network (ANN), classification tree analysis (CTA), and generalized linear model (GLM). This study uses 13 factors affecting soil erosion in the Talar watershed, Iran, to increase prediction accuracy. The results reveal that the RF model has the highest prediction performance (AUC=0.95, Kappa=0.87, Accuracy=0.93, and Bias=0.88), outperforming the three machine-learning models. The results show that slope angle, land use/land cover, elevation, and rainfall erosivity are the factors that contribute the most to soil erosion propensity in the watershed. Curvature and topography position index (TPI) were removed from the analysis due to multicollinearity with other factors. The results can be used to improve the identification of hot spots of soil erosion, especially in watersheds for which soil-erosion data are limited.


Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 579 ◽  
Author(s):  
Baosu Guo ◽  
Jingwen Hu ◽  
Wenwen Wu ◽  
Qingjin Peng ◽  
Fenghe Wu

Machine learning algorithms have been widely used to deal with a variety of practical problems such as computer vision and speech processing. But the performance of machine learning algorithms is primarily affected by their hyper-parameters, as without good hyper-parameter values the performance of these algorithms will be very poor. Unfortunately, for complex machine learning models like deep neural networks, it is very difficult to determine their hyper-parameters. Therefore, it is of great significance to develop an efficient algorithm for hyper-parameter automatic optimization. In this paper, a novel hyper-parameter optimization methodology is presented to combine the advantages of a Genetic Algorithm and Tabu Search to achieve the efficient search for hyper-parameters of learning algorithms. This method is defined as the Tabu_Genetic Algorithm. In order to verify the performance of the proposed algorithm, two sets of contrast experiments are conducted. The Tabu_Genetic Algorithm and other four methods are simultaneously used to search for good values of hyper-parameters of deep convolutional neural networks. Experimental results show that, compared to Random Search and Bayesian optimization methods, the proposed Tabu_Genetic Algorithm finds a better model in less time. Whether in a low-dimensional or high-dimensional space, the Tabu_Genetic Algorithm has better search capabilities as an effective method for finding the hyper-parameters of learning algorithms. The presented method in this paper provides a new solution for solving the hyper-parameters optimization problem of complex machine learning models, which will provide machine learning algorithms with better performance when solving practical problems.


Sign in / Sign up

Export Citation Format

Share Document