scholarly journals Practical Federated Gradient Boosting Decision Trees

2020 ◽  
Vol 34 (04) ◽  
pp. 4642-4649 ◽  
Author(s):  
Qinbin Li ◽  
Zeyi Wen ◽  
Bingsheng He

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

2014 ◽  
Vol 26 (4) ◽  
pp. 781-817 ◽  
Author(s):  
Ching-Pei Lee ◽  
Chih-Jen Lin

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.


2017 ◽  
Vol 16 (06) ◽  
pp. 1707-1727 ◽  
Author(s):  
Morteza Mashayekhi ◽  
Robin Gras

Decision trees are examples of easily interpretable models whose predictive accuracy is normally low. In comparison, decision tree ensembles (DTEs) such as random forest (RF) exhibit high predictive accuracy while being regarded as black-box models. We propose three new rule extraction algorithms from DTEs. The RF[Formula: see text]DHC method, a hill climbing method with downhill moves (DHC), is used to search for a rule set that decreases the number of rules dramatically. In the RF[Formula: see text]SGL and RF[Formula: see text]MSGL methods, the sparse group lasso (SGL) method, and the multiclass SGL (MSGL) method are employed respectively to find a sparse weight vector corresponding to the rules generated by RF. Experimental results with 24 data sets show that the proposed methods outperform similar state-of-the-art methods, in terms of human comprehensibility, by greatly reducing the number of rules and limiting the number of antecedents in the retained rules, while preserving the same level of accuracy.


Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 202
Author(s):  
Ge Gao ◽  
Hongxin Wang ◽  
Pengbin Gao

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.


2021 ◽  
Author(s):  
Javad Iskandarov ◽  
George Fanourgakis ◽  
Waleed Alameri ◽  
George Froudakis ◽  
Georgios Karanikolos

Abstract Conventional foam modelling techniques require tuning of too many parameters and long computational time in order to provide accurate predictions. Therefore, there is a need for alternative methodologies for the efficient and reliable prediction of the foams’ performance. Foams are susceptible to various operational conditions and reservoir parameters. This research aims to apply machine learning (ML) algorithms to experimental data in order to correlate important affecting parameters to foam rheology. In this way, optimum operational conditions for CO2 foam enhanced oil recovery (EOR) can be determined. In order to achieve that, five different ML algorithms were applied to experimental rheology data from various experimental studies. It was concluded that the Gradient Boosting (GB) algorithm could successfully fit the training data and give the most accurate predictions for unknown cases.


Author(s):  
J. Andrew Onesimu ◽  
Karthikeyan J. ◽  
D. Samuel Joshua Viswas ◽  
Robin D Sebastian

Deep learning is the buzz word in recent times in the research field due to its various advantages in the fields of healthcare, medicine, automobiles, etc. A huge amount of data is required for deep learning to achieve better accuracy; thus, it is important to protect the data from security and privacy breaches. In this chapter, a comprehensive survey of security and privacy challenges in deep learning is presented. The security attacks such as poisoning attacks, evasion attacks, and black-box attacks are explored with its prevention and defence techniques. A comparative analysis is done on various techniques to prevent the data from such security attacks. Privacy is another major challenge in deep learning. In this chapter, the authors presented an in-depth survey on various privacy-preserving techniques for deep learning such as differential privacy, homomorphic encryption, secret sharing, and secure multi-party computation. A detailed comparison table to compare the various privacy-preserving techniques and approaches is also presented.


2018 ◽  
Vol 51 ◽  
pp. 02004 ◽  
Author(s):  
Stanislav Eroshenko ◽  
Alexandra Khalyasmaa ◽  
Denis Snegirev

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.


2019 ◽  
Vol 116 (40) ◽  
pp. 19887-19893 ◽  
Author(s):  
José Marcio Luna ◽  
Efstathios D. Gennatas ◽  
Lyle H. Ungar ◽  
Eric Eaton ◽  
Eric S. Diffenderfer ◽  
...  

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.


Sign in / Sign up

Export Citation Format

Share Document