Practical Federated Gradient Boosting Decision Trees

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

Download Full-text

Large-Scale Linear RankSVM

Neural Computation ◽

10.1162/neco_a_00571 ◽

2014 ◽

Vol 26 (4) ◽

pp. 781-817 ◽

Cited By ~ 48

Author(s):

Ching-Pei Lee ◽

Chih-Jen Lin

Keyword(s):

Decision Trees ◽

Computational Efficiency ◽

Efficient Algorithm ◽

Large Scale ◽

Learning To Rank ◽

Gradient Boosting ◽

Baseline Model ◽

Nonlinear Methods ◽

Advantages And Disadvantages ◽

Linear Ranksvm

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.

Download Full-text

Prediction of heart disease using apache spark analysing decision trees and gradient boosting algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/263/4/042078 ◽

2017 ◽

Vol 263 ◽

pp. 042078

Author(s):

Saryu Chugh ◽

K Arivu Selvan ◽

RK Nadesh

Keyword(s):

Heart Disease ◽

Decision Trees ◽

Apache Spark ◽

Gradient Boosting ◽

Boosting Algorithm

Download Full-text

The effects of pruning methods on the predictive accuracy of induced decision trees

Applied Stochastic Models in Business and Industry ◽

10.1002/(sici)1526-4025(199910/12)15:4<277::aid-asmb393>3.0.co;2-b ◽

1999 ◽

Vol 15 (4) ◽

pp. 277-299 ◽

Cited By ~ 19

Author(s):

Floriana Esposito ◽

Donato Malerba ◽

Giovanni Semeraro ◽

Valentina Tamma

Keyword(s):

Decision Trees ◽

Predictive Accuracy ◽

Pruning Methods

Download Full-text

Rule Extraction from Decision Trees Ensembles: New Algorithms Based on Heuristic Search and Sparse Group Lasso Methods

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622017500055 ◽

2017 ◽

Vol 16 (06) ◽

pp. 1707-1727 ◽

Cited By ~ 9

Author(s):

Morteza Mashayekhi ◽

Robin Gras

Keyword(s):

Decision Trees ◽

Predictive Accuracy ◽

Weight Vector ◽

Rule Extraction ◽

Group Lasso ◽

Hill Climbing ◽

Data Sets ◽

Sparse Group Lasso ◽

Rule Set ◽

Interpretable Models

Decision trees are examples of easily interpretable models whose predictive accuracy is normally low. In comparison, decision tree ensembles (DTEs) such as random forest (RF) exhibit high predictive accuracy while being regarded as black-box models. We propose three new rule extraction algorithms from DTEs. The RF[Formula: see text]DHC method, a hill climbing method with downhill moves (DHC), is used to search for a rule set that decreases the number of rules dramatically. In the RF[Formula: see text]SGL and RF[Formula: see text]MSGL methods, the sparse group lasso (SGL) method, and the multiclass SGL (MSGL) method are employed respectively to find a sparse weight vector corresponding to the rules generated by RF. Experimental results with 24 data sets show that the proposed methods outperform similar state-of-the-art methods, in terms of human comprehensibility, by greatly reducing the number of rules and limiting the number of antecedents in the retained rules, while preserving the same level of accuracy.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

Machine Learning Application to CO2 Foam Rheology

10.2118/208016-ms ◽

2021 ◽

Author(s):

Javad Iskandarov ◽

George Fanourgakis ◽

Waleed Alameri ◽

George Froudakis ◽

Georgios Karanikolos

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

Experimental Studies ◽

Training Data ◽

Computational Time ◽

Gradient Boosting ◽

Operational Conditions ◽

Co2 Foam ◽

Modelling Techniques ◽

Foam Rheology

Abstract Conventional foam modelling techniques require tuning of too many parameters and long computational time in order to provide accurate predictions. Therefore, there is a need for alternative methodologies for the efficient and reliable prediction of the foams’ performance. Foams are susceptible to various operational conditions and reservoir parameters. This research aims to apply machine learning (ML) algorithms to experimental data in order to correlate important affecting parameters to foam rheology. In this way, optimum operational conditions for CO2 foam enhanced oil recovery (EOR) can be determined. In order to achieve that, five different ML algorithms were applied to experimental rheology data from various experimental studies. It was concluded that the Gradient Boosting (GB) algorithm could successfully fit the training data and give the most accurate predictions for unknown cases.

Download Full-text

Security and Privacy Challenges of Deep Learning

Deep Learning Strategies for Security Enhancement in Wireless Sensor Networks - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-5068-7.ch003 ◽

2020 ◽

pp. 42-64

Author(s):

J. Andrew Onesimu ◽

Karthikeyan J. ◽

D. Samuel Joshua Viswas ◽

Robin D Sebastian

Keyword(s):

Deep Learning ◽

Differential Privacy ◽

Homomorphic Encryption ◽

Detailed Comparison ◽

Privacy Preserving ◽

Research Field ◽

Security And Privacy ◽

Security Attacks ◽

Privacy Breaches ◽

Comparison Table

Deep learning is the buzz word in recent times in the research field due to its various advantages in the fields of healthcare, medicine, automobiles, etc. A huge amount of data is required for deep learning to achieve better accuracy; thus, it is important to protect the data from security and privacy breaches. In this chapter, a comprehensive survey of security and privacy challenges in deep learning is presented. The security attacks such as poisoning attacks, evasion attacks, and black-box attacks are explored with its prevention and defence techniques. A comparative analysis is done on various techniques to prevent the data from such security attacks. Privacy is another major challenge in deep learning. In this chapter, the authors presented an in-depth survey on various privacy-preserving techniques for deep learning such as differential privacy, homomorphic encryption, secret sharing, and secure multi-party computation. A detailed comparison table to compare the various privacy-preserving techniques and approaches is also presented.

Download Full-text

Step-wise multi-grained augmented gradient boosting decision trees for credit scoring

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.104036 ◽

2021 ◽

Vol 97 ◽

pp. 104036

Author(s):

Wanan Liu ◽

Hong Fan ◽

Min Xia

Keyword(s):

Decision Trees ◽

Credit Scoring ◽

Gradient Boosting

Download Full-text

Machine learning techniques for short-term solar power stations operational mode planning

E3S Web of Conferences ◽

10.1051/e3sconf/20185102004 ◽

2018 ◽

Vol 51 ◽

pp. 02004 ◽

Cited By ~ 3

Author(s):

Stanislav Eroshenko ◽

Alexandra Khalyasmaa ◽

Denis Snegirev

Keyword(s):

Decision Trees ◽

Solar Power ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Operational Mode ◽

Mathematical Methods ◽

Short Term ◽

Advantages And Disadvantages ◽

Power Stations ◽

Operational Forecasting

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.

Download Full-text

Building more accurate decision trees with the additive tree

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1816748116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19887-19893 ◽

Cited By ~ 15

Author(s):

José Marcio Luna ◽

Efstathios D. Gennatas ◽

Lyle H. Ungar ◽

Eric Eaton ◽

Eric S. Diffenderfer ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Ensemble Methods ◽

Predictive Performance ◽

Additive Models ◽

Gradient Boosting ◽

Clear Understanding ◽

High Stakes ◽

Additive Tree ◽

Full Interaction

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

Download Full-text