Data classification with binary response through the Boosting algorithm and logistic regression

2017 ◽  
Vol 69 ◽  
pp. 62-73 ◽  
Author(s):  
Fortunato S. de Menezes ◽  
Gilberto R. Liska ◽  
Marcelo A. Cirillo ◽  
Mário J.F. Vivanco
2021 ◽  
Vol 16 ◽  
pp. 705-714
Author(s):  
Abela Chairunissa ◽  
Solimun Solimun ◽  
Adji Achmad Rinaldo Fernandes

Credit risk is the risk that has the greatest opportunity to occur in banking. The number of bad loans will also affect bank performance. The banking sector needs to know whether a prospective creditor is classified as a risky person or not. The purpose of this study is to classify creditors and compare the classification results through logistic regression with the maximum likelihood model and the Boosting algorithm, especially the AdaBoost algorithm, and to select a model with the Boosting algorithm Credit Scoring aims to classify prospective creditor into two classes, namely good prospective creditor (Performing Loan) and bad prospective creditor (Non Performing Loan) based on certain characteristics. The method often used for classifying creditor is logistic regression, but this method is less robust and less accurate than data mining. Thus, there is a need for methods that provide greater accuracy. Among the methods that have been proposed is a method called Boosting, which operates sequentially by applying a classification algorithm to the reweighted version of the training data set. This study uses 5 datasets. The first dataset is secondary data originating from data on non-subsidized homeownership creditors of Bank X Malang City. While the other datasets are simulation data with many samples of 10, 500, and 1000. The results of this study indicate that ensemble boosting logistic regression is more suitable for describing binary response problems, especially creditor classification because it provides more accurate information. For high-dimensional data, which is represented by a sample size of 10, ensemble logistic regression is proven to be able to produce fairly accurate predictions with an accuracy rate of up to 80%, whereas in the logistic regression analysis the model raises N.A because many samples < many independent variables. The use of boosting is preferred because it focuses on problems that are misclassified and have a tendency to increase to higher accuracy.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Baofeng Shi ◽  
Guotai Chi

This paper presents an approach to recognize key factors in data classification. Using collinearity diagnostics to delete the factors of repeated information and Logistic regression significant discriminant to select the factors which can effectively distinguish the two kinds of samples, this paper creates a model for recognizing key factors. The proposed model is demonstrated by using the 2044 observations in finical engineering. The experimental results demonstrate that the 13 indicators such as “marital status,” “net income of borrower,” and “Engel's coefficient” are the key factors to distinguish the good customers from the bad customers. By analyzing the experimental results, the performance of the proposed model is verified. Moreover, the proposed method is simple and easy to be implemented.


2017 ◽  
Vol 29 (9) ◽  
pp. 1806-1819 ◽  
Author(s):  
Miho Ohsaki ◽  
Peng Wang ◽  
Kenji Matsuda ◽  
Shigeru Katagiri ◽  
Hideyuki Watanabe ◽  
...  

Risks ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 70 ◽  
Author(s):  
Jessica Pesantez-Narvaez ◽  
Montserrat Guillen ◽  
Manuela Alcañiz

XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims versus no claims can be used to identify the determinants of traffic accidents. This study compared the relative performances of logistic regression and XGBoost approaches for predicting the existence of accident claims using telematics data. The dataset contained information from an insurance company about the individuals’ driving patterns—including total annual distance driven and percentage of total distance driven in urban areas. Our findings showed that logistic regression is a suitable model given its interpretability and good predictive capacity. XGBoost requires numerous model-tuning procedures to match the predictive performance of the logistic regression model and greater effort as regards to interpretation.


2018 ◽  
Vol 24 (109) ◽  
pp. 535
Author(s):  
اياد حبيب شمال

Abstract: This paper discusses the problem of semi maulticollinearity in the nonlinear regression model (the multi-logistic regression model) When the dependent variable is a qualitative variable, the binary response is either equal to one for a response or zero for no response, Through the use of Iterative principal component estimatorsWhich are based on the normal weights and conditional Bays weights . If the appliede Estimates this model Through the use of two types of drugs concentrations thy concentration of ciprodar (variable X1) On a number of people with Patients with renal disease represent the dependent variable (The person heals from the disease  , The person has not recovered from the disease )from through Mean Error Squares (MSE) The results were indicative of Iterative principal component estemaite   Depending on the conditional Bays weights prefer the Iterative principal component estimators Depending on the the normal weights.


Sign in / Sign up

Export Citation Format

Share Document