Label Aggregation of Gradient Boosting Decision Trees

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.

Download Full-text

Prediction of heart disease using apache spark analysing decision trees and gradient boosting algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/263/4/042078 ◽

2017 ◽

Vol 263 ◽

pp. 042078

Author(s):

Saryu Chugh ◽

K Arivu Selvan ◽

RK Nadesh

Keyword(s):

Heart Disease ◽

Decision Trees ◽

Apache Spark ◽

Gradient Boosting ◽

Boosting Algorithm

Download Full-text

Step-wise multi-grained augmented gradient boosting decision trees for credit scoring

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.104036 ◽

2021 ◽

Vol 97 ◽

pp. 104036

Author(s):

Wanan Liu ◽

Hong Fan ◽

Min Xia

Keyword(s):

Decision Trees ◽

Credit Scoring ◽

Gradient Boosting

Download Full-text

Machine learning techniques for short-term solar power stations operational mode planning

E3S Web of Conferences ◽

10.1051/e3sconf/20185102004 ◽

2018 ◽

Vol 51 ◽

pp. 02004 ◽

Cited By ~ 3

Author(s):

Stanislav Eroshenko ◽

Alexandra Khalyasmaa ◽

Denis Snegirev

Keyword(s):

Decision Trees ◽

Solar Power ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Operational Mode ◽

Mathematical Methods ◽

Short Term ◽

Advantages And Disadvantages ◽

Power Stations ◽

Operational Forecasting

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.

Download Full-text

Building more accurate decision trees with the additive tree

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1816748116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19887-19893 ◽

Cited By ~ 15

Author(s):

José Marcio Luna ◽

Efstathios D. Gennatas ◽

Lyle H. Ungar ◽

Eric Eaton ◽

Eric S. Diffenderfer ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Ensemble Methods ◽

Predictive Performance ◽

Additive Models ◽

Gradient Boosting ◽

Clear Understanding ◽

High Stakes ◽

Additive Tree ◽

Full Interaction

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

Download Full-text

A Novel Ensemble Approach for Click-Through Rate Prediction Based on Factorization Machines and Gradient Boosting Decision Trees

Web and Big Data - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26075-0_12 ◽

2019 ◽

pp. 152-162 ◽

Cited By ~ 1

Author(s):

Xiaochen Wang ◽

Gang Hu ◽

Haoyang Lin ◽

Jiayu Sun

Keyword(s):

Decision Trees ◽

Gradient Boosting ◽

Rate Prediction ◽

Ensemble Approach ◽

Click Through Rate

Download Full-text

A mobile recommendation system based on logistic regression and Gradient Boosting Decision Trees

2016 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2016.7727431 ◽

2016 ◽

Cited By ~ 21

Author(s):

Yaozheng Wang ◽

Dawei Feng ◽

Dongsheng Li ◽

Xinyuan Chen ◽

Yunxiang Zhao ◽

...

Keyword(s):

Logistic Regression ◽

Decision Trees ◽

Recommendation System ◽

Gradient Boosting

Download Full-text

Machine learning techniques for short-term solar power stations operational mode planning

E3S Web of Conferences ◽

10.1051/e3scconf/20185102004 ◽

2018 ◽

Vol 51 ◽

pp. 02004

Author(s):

Stanislav Eroshenko ◽

Alexandra Khalyasmaa ◽

Denis Snegirev

Keyword(s):

Decision Trees ◽

Solar Power ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Operational Mode ◽

Mathematical Methods ◽

Short Term ◽

Advantages And Disadvantages ◽

Power Stations ◽

Operational Forecasting

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.

Download Full-text

Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees

CMAJ Open ◽

10.9778/cmajo.20210036 ◽

2021 ◽

Vol 9 (4) ◽

pp. E1223-E1231

Author(s):

Jahir M. Gutierrez ◽

Maksims Volkovs ◽

Tomi Poutanen ◽

Tristan Watson ◽

Laura C. Rosella

Keyword(s):

Risk Stratification ◽

Decision Trees ◽

Gradient Boosting ◽

Multivariable Model ◽

Model Based

Download Full-text

Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

Atmospheric Measurement Techniques ◽

10.5194/amt-10-695-2017 ◽

2017 ◽

Vol 10 (2) ◽

pp. 695-708 ◽

Cited By ~ 25

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Paul H. Kaye ◽

Warren R. Stanley ◽

...

Keyword(s):

Neural Networks ◽

Decision Trees ◽

Supervised Learning ◽

Ensemble Methods ◽

Gradient Boosting ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Shape Information ◽

Accuracy Of Measurements

Abstract. Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.

Download Full-text