scholarly journals An Interpretable Extreme Gradient Boosting Model to Predict Ash Fusion Temperatures

Minerals ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 487
Author(s):  
Maciej Rzychoń ◽  
Alina Żogała ◽  
Leokadia Róg

The hemispherical temperature (HT) is the most important indicator representing ash fusion temperatures (AFTs) in the Polish industry to assess the suitability of coal for combustion as well as gasification purposes. It is important, for safe operation and energy saving, to know or to be able to predict value of this parameter. In this study a non-linear model predicting the HT value, based on ash oxides content for 360 coal samples from the Upper Silesian Coal Basin, was developed. The proposed model was established using the machine learning method—extreme gradient boosting (XGBoost) regressor. An important feature of models based on the XGBoost algorithm is the ability to determine the impact of individual input parameters on the predicted value using the feature importance (FI) technique. This method allowed the determination of ash oxides having the greatest impact on the projected HT. Then, the partial dependence plots (PDP) technique was used to visualize the effect of individual oxides on the predicted value. The results indicate that proposed model could estimate value of HT with high accuracy. The coefficient of determination (R2) of the prediction has reached satisfactory value of 0.88.

2020 ◽  
Vol 10 (18) ◽  
pp. 6619
Author(s):  
Po-Jiun Wen ◽  
Chihpin Huang

The noise prediction using machine learning is a special study that has recently received increased attention. This is particularly true in workplaces with noise pollution, which increases noise exposure for general laborers. This study attempts to analyze the noise equivalent level (Leq) at the National Synchrotron Radiation Research Center (NSRRC) facility and establish a machine learning model for noise prediction. This study utilized the gradient boosting model (GBM) as the learning model in which past noise measurement records and many other features are integrated as the proposed model makes a prediction. This study analyzed the time duration and frequency of the collected Leq and also investigated the impact of training data selection. The results presented in this paper indicate that the proposed prediction model works well in almost noise sensors and frequencies. Moreover, the model performed especially well in sensor 8 (125 Hz), which was determined to be a serious noise zone in the past noise measurements. The results also show that the root-mean-square-error (RMSE) of the predicted harmful noise was less than 1 dBA and the coefficient of determination (R2) value was greater than 0.7. That is, the working field showed a favorable noise prediction performance using the proposed method. This positive result shows the ability of the proposed approach in noise prediction, thus providing a notification to the laborer to prevent long-term exposure. In addition, the proposed model accurately predicts noise future pollution, which is essential for laborers in high-noise environments. This would keep employees healthy in avoiding noise harmful positions to prevent people from working in that environment.


2021 ◽  
Vol 13 (7) ◽  
pp. 3727
Author(s):  
Fatema Rahimi ◽  
Abolghasem Sadeghi-Niaraki ◽  
Mostafa Ghodousi ◽  
Soo-Mi Choi

During dangerous circumstances, knowledge about population distribution is essential for urban infrastructure architecture, policy-making, and urban planning with the best Spatial-temporal resolution. The spatial-temporal modeling of the population distribution of the case study was investigated in the present study. In this regard, the number of generated trips and absorbed trips using the taxis pick-up and drop-off location data was calculated first, and the census population was then allocated to each neighborhood. Finally, the Spatial-temporal distribution of the population was calculated using the developed model. In order to evaluate the model, a regression analysis between the census population and the predicted population for the time period between 21:00 to 23:00 was used. Based on the calculation of the number of generated and the absorbed trips, it showed a different spatial distribution for different hours in one day. The spatial pattern of the population distribution during the day was different from the population distribution during the night. The coefficient of determination of the regression analysis for the model (R2) was 0.9998, and the mean squared error was 10.78. The regression analysis showed that the model works well for the nighttime population at the neighborhood level, so the proposed model will be suitable for the day time population.


Author(s):  
Irfan Ullah Khan ◽  
Nida Aslam ◽  
Malak Aljabri ◽  
Sumayh S. Aljameel ◽  
Mariam Moataz Aly Kamaleldin ◽  
...  

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.


2021 ◽  
Vol 13 (6) ◽  
pp. 1147
Author(s):  
Xiangqian Li ◽  
Wenping Yuan ◽  
Wenjie Dong

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


Healthcare ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 39
Author(s):  
Yuqing Liang ◽  
Wanwan Zheng ◽  
Woon-Seek Lee

Background: although China’s total health expenditure has been dramatically increased so that the country can cope with its aging population, inequalities among individuals in terms of their medical expenditures (relative to their income level) have exacerbated health problems among older adults. This study aims to examine the nonlinear associations between each of medical expenditure, perceived medical attitude, and sociodemographics, and older adults’ self-rated health (SRH); it does so by using data from the 2018 China Family Panel Studies survey. Method: we used the extreme gradient boosting model to explore the nonlinear association between various factors and older adults’ SRH outcomes. We then conducted partial dependence plots to examine the threshold effects of each factor on older adults’ SRH. Results: older adults’ medical expenditure exceeded their overall income. Body mass index (BMI) and personal health expenditure play an essential role in predicting older adults’ SRH outcomes. We found older adult age, physical exercise status, and residential location to be robust predictors of SRH outcomes in older adults. Partial dependence plots of the results visualized the nonlinear association between variables and the threshold effects of factors on older adults’ SRH outcomes. Conclusions: findings from this study underscore the importance of medical expenditure, perceived medical attitudes, and BMI as important predictors of health benefits in older adults. The potential threshold effects of medical expenditure on older adults’ SRH outcomes provide a better understanding of the formation of appropriate medical policy interventions by balancing the government and personal medical expenditure to promote health benefits among older adults.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hengrui Chen ◽  
Hong Chen ◽  
Ruiyu Zhou ◽  
Zhizhen Liu ◽  
Xiaoke Sun

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.


2017 ◽  
Vol 8 (4) ◽  
pp. 34
Author(s):  
Ra’ed Masa’deh ◽  
Mohammed Abdullah Nasseef ◽  
Ala Alkoudary ◽  
Hanaa Mansour ◽  
Mervat Aldarabah

The aim of this research is to explore the associations among motivation for attendance to Aqaba city, destination satisfaction, and destination loyalty. The research surveyed samples of 200 and used Structural Equation Model for research analysis and testing. The results show that motivation for attendance to Aqaba city positively affects tourists’ destination loyalty. The motivation for attendance positively affects destination satisfaction; and tourists’ destination satisfaction affects tourists’ destination loyalty. Furthermore, the coefficient of determination (R²) for the research endogenous variables for tourists’ destination satisfaction, and tourists’ destination loyalty were 0.46, and 0.66 respectively, which indicates that the model does moderately account for the variation of the proposed model; however, opens the gate for further research.


2019 ◽  
Vol 8 (7) ◽  
pp. 315 ◽  
Author(s):  
Fei Sun ◽  
Run Wang ◽  
Bo Wan ◽  
Yanjun Su ◽  
Qinghua Guo ◽  
...  

Imbalanced learning is a methodological challenge in remote sensing communities, especially in complex areas where the spectral similarity exists between land covers. Obtaining high-confidence classification results for imbalanced class issues is highly important in practice. In this paper, extreme gradient boosting (XGB), a novel tree-based ensemble system, is employed to classify the land cover types in Very-high resolution (VHR) images with imbalanced training data. We introduce an extended margin criterion and disagreement performance to evaluate the efficiency of XGB in imbalanced learning situations and examine the effect of minority class spectral separability on model performance. The results suggest that the uncertainty of XGB associated with correct classification is stable. The average probability-based margin of correct classification provided by XGB is 0.82, which is about 46.30% higher than that by random forest (RF) method (0.56). Moreover, the performance uncertainty of XGB is insensitive to spectral separability after the sample imbalance reached a certain level (minority:majority > 10:100). The impact of sample imbalance on the minority class is also related to its spectral separability, and XGB performs better than RF in terms of user accuracy for the minority class with imperfect separability. The disagreement components of XGB are better and more stable than RF with imbalanced samples, especially for complex areas with more types. In addition, appropriate sample imbalance helps to improve the trade-off between the recognition accuracy of XGB and the sample cost. According to our analysis, this margin-based uncertainty assessment and disagreement performance can help users identify the confidence level and error component in similar classification performance (overall, producer, and user accuracies).


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4300
Author(s):  
Kosuke Sasakura ◽  
Takeshi Aoki ◽  
Masayoshi Komatsu ◽  
Takeshi Watanabe

Data centers (DCs) are becoming increasingly important in recent years, and highly efficient and reliable operation and management of DCs is now required. The generated heat density of the rack and information and communication technology (ICT) equipment is predicted to get higher in the future, so it is crucial to maintain the appropriate temperature environment in the server room where high heat is generated in order to ensure continuous service. It is especially important to predict changes of rack intake temperature in the server room when the computer room air conditioner (CRAC) is shut down, which can cause a rapid rise in temperature. However, it is quite difficult to predict the rack temperature accurately, which in turn makes it difficult to determine the impact on service in advance. In this research, we propose a model that predicts the rack intake temperature after the CRAC is shut down. Specifically, we use machine learning to construct a gradient boosting decision tree model with data from the CRAC, ICT equipment, and rack intake temperature. Experimental results demonstrate that the proposed method has a very high prediction accuracy: the coefficient of determination was 0.90 and the root mean square error (RMSE) was 0.54. Our model makes it possible to evaluate the impact on service and determine if action to maintain the temperature environment is required. We also clarify the effect of explanatory variables and training data of the machine learning on the model accuracy.


Sign in / Sign up

Export Citation Format

Share Document