Mineral grade estimation using gradient boosting regression trees

2019 ◽

Cited By ~ 3

Author(s):

Yu Shi ◽

Jian Li ◽

Zhize Li

Keyword(s):

Linear Regression ◽

Learning Algorithm ◽

Piecewise Linear ◽

Regression Trees ◽

Gradient Boosting ◽

Training Algorithms ◽

Training Time ◽

Modern Computer ◽

Multiple Data ◽

Boosted Decision Trees

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.

Download Full-text

Multistep-ahead daily inflow forecasting using the ERA-Interim reanalysis data set based on gradient-boosting regression trees

Hydrology and Earth System Sciences ◽

10.5194/hess-24-2343-2020 ◽

2020 ◽

Vol 24 (5) ◽

pp. 2343-2363

Author(s):

Shengli Liao ◽

Zhanwei Liu ◽

Benxi Liu ◽

Chuntian Cheng ◽

Xinfeng Jin ◽

...

Keyword(s):

Extreme Values ◽

Regression Trees ◽

Reanalysis Data ◽

Gradient Boosting ◽

Support Vector ◽

Lead Times ◽

Feature Subset ◽

Data Set ◽

Inflow Forecasting ◽

Maximal Information Coefficient

Abstract. Inflow forecasting plays an essential role in reservoir management and operation. The impacts of climate change and human activities have made accurate inflow prediction increasingly difficult, especially for longer lead times. In this study, a new hybrid inflow forecast framework – using the ERA-Interim reanalysis data set as input and adopting gradient-boosting regression trees (GBRT) and the maximal information coefficient (MIC) – is developed for multistep-ahead daily inflow forecasting. Firstly, the ERA-Interim reanalysis data set provides more information for the framework, allowing it to discover inflow for longer lead times. Secondly, MIC can identify an effective feature subset from massive features that significantly affects inflow; therefore, the framework can reduce computational burden, distinguish key attributes from unimportant ones and provide a concise understanding of inflow. Lastly, GBRT is a prediction model in the form of an ensemble of decision trees, and it has a strong ability to more fully capture nonlinear relationships between input and output at longer lead times. The Xiaowan hydropower station, located in Yunnan Province, China, was selected as the study area. Six evaluation criteria, namely the mean absolute error (MAE), the root-mean-squared error (RMSE), the Pearson correlation coefficient (CORR), Kling–Gupta efficiency (KGE) scores, the percent bias in the flow duration curve high-segment volume (BHV) and the index of agreement (IA) are used to evaluate the established models utilizing historical daily inflow data (1 January 2017–31 December 2018). The performance of the presented framework is compared to that of artificial neural network (ANN), support vector regression (SVR) and multiple linear regression (MLR) models. The results indicate that reanalysis data enhance the accuracy of inflow forecasting for all of the lead times studied (1–10 d), and the method developed generally performs better than other models, especially for extreme values and longer lead times (4–10 d).

Download Full-text

Multi-step ahead daily inflow forecasting using ERA-Interim reanalysis dataset based on gradient boosting regression trees

10.5194/hess-2019-610 ◽

2019 ◽

Author(s):

Shengli Liao ◽

Zhanwei Liu ◽

Benxi Liu ◽

Chuntian Cheng ◽

Xinfeng Jin ◽

...

Keyword(s):

Regression Trees ◽

Reanalysis Data ◽

Gradient Boosting ◽

Support Vector ◽

Lead Times ◽

Feature Subset ◽

Efficiency Coefficient ◽

Reanalysis Dataset ◽

Maximum Information ◽

Inflow Forecasting

Abstract. Inflow forecasting plays an essential role in reservoir management and operation. The impacts of climate change and human activities make accurate inflow prediction increasingly difficult, especially for longer lead times. In this study, a new hybrid inflow forecast framework with ERA-Interim reanalysis data as input, adopting gradient boosting regression trees (GBRT) and the maximum information coefficient (MIC) was developed for multi-step ahead daily inflow forecasting. Firstly, the ERA-Interim reanalysis dataset provides enough information for the framework to discover inflow for longer lead times. Secondly, MIC can identify effective feature subset from massive features that significantly affects inflow so that the framework can avoid over-fitting, distinguish key attributes with unimportant ones and provide a concise understanding of inflow. Lastly, the GBRT is a prediction model in the form of an ensemble of decision trees and has a strong ability to capture nonlinear relationships between input and output in long lead times more fully. The Xiaowan hydropower station located in Yunnan Province, China is selected as the study area. Four evaluation criteria, the mean absolute error (MAE), the root mean square error (RMSE), the Nash-Sutcliffe efficiency coefficient (NSE) and the Pearson correlation coefficient (CORR), were used to evaluate the established models using historical daily inflow data (1/1/2017–31/12/2018). Performance of the presented framework was compared to that of artificial neural networks (ANN), support vector regression (SVR) and multiple linear regression (MLR) models. The experimental results indicate that the developed method generally performs better than other models and significantly improves the accuracy of inflow forecasting at lead times of 5–10 days. The reanalysis data also enhances the accuracy of inflow forecasting except for forecasts that are one-day ahead.

Download Full-text

Review of “Multi-step ahead daily inflow forecasting using ERA-Interim reanalysis dataset based on gradient boosting regression trees” – HESS-2019-610

10.5194/hess-2019-610-rc3 ◽

2020 ◽

Author(s):

Anonymous

Keyword(s):

Regression Trees ◽

Gradient Boosting ◽

Reanalysis Dataset ◽

Inflow Forecasting

Download Full-text

Intercomparisons of liquid water path based on SEVIRI images and gradient boosting regression trees with in-situ observations and satellite-derived products

10.5194/egusphere-egu2020-18806 ◽

2020 ◽

Author(s):

Miae Kim ◽

Jan Cermak ◽

Hendrik Andersen ◽

Julia Fuchs ◽

Roland Stirnberg

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Climate Models ◽

Regression Trees ◽

Boosted Regression Trees ◽

Gradient Boosting ◽

Liquid Water Path ◽

Water Path ◽

First Results

<div>This contribution presents a technique for the machine-learning-based retrieval of cloud liquid&#160;water path. Cloud effects are among the major uncertainties in climate models for estimating&#160;and predicting the Earth&#8217;s energy budget. The study of cloud processes requires information&#160;on cloud physical properties, such as the liquid water path (LWP), which is commonly&#160;retrieved from satellite sensors using look-up table approaches. However, the accuracy of&#160;LWP varies temporally and spatially, also due to assumptions inherent in any physical&#160;retrieval. The aim of this study is to improve the accuracy of LWP and analyze quantitatively&#160;the accuracy and its errors. To this end, a statistical LWP retrieval was developed using&#160;spectral information from geostationary satellite channels (Meteosat Spinning-Enhanced&#160;Visible and Infrared Imager, SEVIRI), and satellite viewing geometry. The machine-learning&#160;method chosen is gradient-boosted regression trees (GBRTs), which is an ensemble of&#160;decision trees but more effective than traditional tree-based models. This study reports on&#160;first results, as well as a comparison between the GBRT-derived LWP estimates and those&#160;from the SEVIRI-based products of the Climate Monitoring Satellite Application Facility&#160;(CM-SAF, CLAAS-A2), as well as MODIS products. We use case studies for individual&#160;in-situ measurement sites in Europe under varying meteorological conditions to determine&#160;the factors influencing LWP retrieval quality.</div>

Download Full-text

P2726Extremely boosted prediction of cardiac amyloidosis by routine laboratory paramaters

European Heart Journal ◽

10.1093/eurheartj/ehz748.1043 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

A Agibetov ◽

B Seirer ◽

S Aschauer ◽

D Dalos ◽

R Rettl ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Amyloidosis ◽

Missing Values ◽

Prediction Models ◽

Regression Trees ◽

Gradient Boosting ◽

Routine Laboratory ◽

Laboratory Parameters ◽

Extreme Gradient Boosting ◽

Laboratory Results

Abstract Background/Introduction Cardiac amyloidosis (CA) is a rare and complex condition with poor prognosis. Novel therapies have been shown to improve outcome, however, most of the affected individuals remain undiagnosed, mainly due to a lack in awareness among clinicians. One approach to overcome this issue is to use automated diagnostic algorithms that act based on routinely available laboratory results. Purpose We tested the performance of flexible machine learning and traditional statistical prediction models for non-invasive CA diagnosis based on routinely collected laboratory parameters. Since laboratory routines vary between hospitals or other health care providers, special attention has been taken to adaptive and dynamic parameter selection, and to dealing with the frequent occurrence of missing values. Methods Our cohort consisted of 376 clinically accepted patients with various types of heart failure. Of these, 69 were diagnosed with CA via endomyocardial biopsy (positives), and 307 had unrelated cardiac disorders (negatives). A total of 63 routine laboratory parameters were collected from these patients, with a high incidence of missing values (on average 60% of patients for each parameter). We tested the performance of two prediction models: logistic regression, and extreme gradient boosting with regression trees. To deal with missing values we adopted two strategies: a) finding an optimal overlap of parameters and deleting all patients with missing values (reduction of parameters and samples), and b) retaining all features and imputing missing values with parameter-wise means. To fairly assess the performance of prediction models we employed a 10-fold cross validation (stratified to preserve sample class ratio). Finally, area under curve for receiver-operator characteristic (ROC AUC) was used as our final performance measure. Results A complex machine learning model based on forests of regression trees proved to be the most performant (ROC AUC 0.94±4%) and robust to missing values. The best regression model was obtained with the 25 most frequent variables and patient deletion in case of missing values (ROC AUC 0.82±0.8%). While progressive inclusion of predictor variables worsened the performance of the logistic regression, it increased that of the machine learning approach. Conclusions Extreme gradient boosting of regression trees by routine laboratory parameters achieved staggering accuracy results for the automated diagnosis of CA. Our data suggest that implementations of such algorithms as independent interpreters of routine laboratory results may help to establish or suggest the diagnosis of CA in patients with heart failure symptoms, even in the absence of specialized experts.

Download Full-text

Mixture Optimization of Recycled Aggregate Concrete Using Hybrid Machine Learning Model

Materials ◽

10.3390/ma13194331 ◽

2020 ◽

Vol 13 (19) ◽

pp. 4331

Author(s):

Itzel Nunez ◽

Afshin Marani ◽

Moncef L. Nehdi

Keyword(s):

Machine Learning ◽

Compressive Strength ◽

Regression Trees ◽

Mixture Design ◽

Recycled Aggregate Concrete ◽

Construction And Demolition Waste ◽

Recycled Aggregate ◽

Gradient Boosting ◽

Environmental Footprint ◽

Aggregate Concrete

Recycled aggregate concrete (RAC) contributes to mitigating the depletion of natural aggregates, alleviating the carbon footprint of concrete construction, and averting the landfilling of colossal amounts of construction and demolition waste. However, complexities in the mixture optimization of RAC due to the variability of recycled aggregates and lack of accuracy in estimating its compressive strength require novel and sophisticated techniques. This paper aims at developing state-of-the-art machine learning models to predict the RAC compressive strength and optimize its mixture design. Results show that the developed models including Gaussian processes, deep learning, and gradient boosting regression achieved robust predictive performance, with the gradient boosting regression trees yielding highest prediction accuracy. Furthermore, a particle swarm optimization coupled with gradient boosting regression trees model was developed to optimize the mixture design of RAC for various compressive strength classes. The hybrid model achieved cost-saving RAC mixture designs with lower environmental footprint for different target compressive strength classes. The model could be further harvested to achieve sustainable concrete with optimal recycled aggregate content, least cost, and least environmental footprint.

Download Full-text

Modeling Genetic Polymorphisms and Sickle Cell Associated Vasoocclusive Events Using Classification and Regression Trees (CART) and Stochastic Gradient Boosting (SGB)

American Journal of Epidemiology ◽

10.1093/aje/163.suppl_11.s130-a ◽

2006 ◽

Vol 163 (suppl_11) ◽

pp. S130-S130

Author(s):

VG Nolan ◽

M Wilcox ◽

P Sebastiani ◽

C Baldwin ◽

D Wyszynski ◽

...

Keyword(s):

Genetic Polymorphisms ◽

Sickle Cell ◽

Regression Trees ◽

Classification And Regression Trees ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Stochastic Gradient Boosting ◽

Classification And Regression

Download Full-text

Two-Level Regression Method Using Ensembles of Trees with Optimal Divergence

Doklady Mathematics ◽

10.1134/s1064562421040177 ◽

2021 ◽

Author(s):

Yu. I. Zhuravlev ◽

O. V. Senko ◽

A. A. Dokukin ◽

N. N. Kiselyova ◽

I. A. Saenko

Keyword(s):

Regression Analysis ◽

Decision Trees ◽

Regression Trees ◽

Regression Method ◽

Random Regression ◽

Gradient Boosting ◽

Analysis Method ◽

Boosting Method ◽

Corrective Procedures ◽

Decision Forest

Abstract The article discusses a new two-level regression analysis method in which a corrective procedure is applied to optimal ensembles of regression trees. Optimization is carried out based on the simultaneous achievement of the divergence of the algorithms in the forecast space and a good approximation of the data by individual algorithms of the ensemble. Simple averaging, random regression forest, and gradient boosting are used as corrective procedures. Experiments are presented comparing the proposed method with the standard decision forest and the standard gradient boosting method for decision trees.

Download Full-text

Predicting shared-car use and examining nonlinear effects using gradient boosting regression trees

International Journal of Sustainable Transportation ◽

10.1080/15568318.2020.1827316 ◽

2020 ◽

pp. 1-15

Author(s):

Tao Wang ◽

Songhua Hu ◽

Yuan Jiang

Keyword(s):

Regression Trees ◽

Nonlinear Effects ◽

Gradient Boosting ◽

Car Use

Download Full-text

Mineral grade estimation using gradient boosting regression trees

Gradient Boosting with Piece-Wise Linear Regression Trees

Multistep-ahead daily inflow forecasting using the ERA-Interim reanalysis data set based on gradient-boosting regression trees

Multi-step ahead daily inflow forecasting using ERA-Interim reanalysis dataset based on gradient boosting regression trees

Review of “Multi-step ahead daily inflow forecasting using ERA-Interim reanalysis dataset based on gradient boosting regression trees” – HESS-2019-610

Intercomparisons of liquid water path based on SEVIRI images and gradient boosting regression trees with in-situ observations and satellite-derived products

P2726Extremely boosted prediction of cardiac amyloidosis by routine laboratory paramaters

Mixture Optimization of Recycled Aggregate Concrete Using Hybrid Machine Learning Model

Modeling Genetic Polymorphisms and Sickle Cell Associated Vasoocclusive Events Using Classification and Regression Trees (CART) and Stochastic Gradient Boosting (SGB)

Two-Level Regression Method Using Ensembles of Trees with Optimal Divergence

Predicting shared-car use and examining nonlinear effects using gradient boosting regression trees

Export Citation Format