Creditor Classification Logistic Regression Ensemble Boosting And Logistic Regression In Creditor Classification With Binary Response

Credit risk is the risk that has the greatest opportunity to occur in banking. The number of bad loans will also affect bank performance. The banking sector needs to know whether a prospective creditor is classified as a risky person or not. The purpose of this study is to classify creditors and compare the classification results through logistic regression with the maximum likelihood model and the Boosting algorithm, especially the AdaBoost algorithm, and to select a model with the Boosting algorithm Credit Scoring aims to classify prospective creditor into two classes, namely good prospective creditor (Performing Loan) and bad prospective creditor (Non Performing Loan) based on certain characteristics. The method often used for classifying creditor is logistic regression, but this method is less robust and less accurate than data mining. Thus, there is a need for methods that provide greater accuracy. Among the methods that have been proposed is a method called Boosting, which operates sequentially by applying a classification algorithm to the reweighted version of the training data set. This study uses 5 datasets. The first dataset is secondary data originating from data on non-subsidized homeownership creditors of Bank X Malang City. While the other datasets are simulation data with many samples of 10, 500, and 1000. The results of this study indicate that ensemble boosting logistic regression is more suitable for describing binary response problems, especially creditor classification because it provides more accurate information. For high-dimensional data, which is represented by a sample size of 10, ensemble logistic regression is proven to be able to produce fairly accurate predictions with an accuracy rate of up to 80%, whereas in the logistic regression analysis the model raises N.A because many samples < many independent variables. The use of boosting is preferred because it focuses on problems that are misclassified and have a tendency to increase to higher accuracy.

Download Full-text

Data classification with binary response through the Boosting algorithm and logistic regression

Expert Systems with Applications ◽

10.1016/j.eswa.2016.08.014 ◽

2017 ◽

Vol 69 ◽

pp. 62-73 ◽

Cited By ~ 25

Author(s):

Fortunato S. de Menezes ◽

Gilberto R. Liska ◽

Marcelo A. Cirillo ◽

Mário J.F. Vivanco

Keyword(s):

Logistic Regression ◽

Data Classification ◽

Binary Response ◽

Boosting Algorithm

Download Full-text

Early Detection of Severe Functional Impairment Among Adolescents With Major Depression Using Logistic Classifier

Frontiers in Public Health ◽

10.3389/fpubh.2020.622007 ◽

2021 ◽

Vol 8 ◽

Author(s):

I.-Ming Chiu ◽

Wenhua Lu ◽

Fangming Tian ◽

Daniel Hart

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Model ◽

Logistic Regression Model ◽

Age Groups ◽

Recall Rate ◽

Training Data ◽

Statistical Tool ◽

Data Set ◽

Severe Impairment

Machine learning is about finding patterns and making predictions from raw data. In this study, we aimed to achieve two goals by utilizing the modern logistic regression model as a statistical tool and classifier. First, we analyzed the associations between Major Depressive Episode with Severe Impairment (MDESI) in adolescents with a list of broadly defined sociodemographic characteristics. Using findings from the logistic model, the second and ultimate goal was to identify the potential MDESI cases using a logistic model as a classifier (i.e., a predictive mechanism). Data on adolescents aged 12–17 years who participated in the National Survey on Drug Use and Health (NSDUH), 2011–2017, were pooled and analyzed. The logistic regression model revealed that compared with males and adolescents aged 12-13, females and those in the age groups of 14-15 and 16-17 had higher risk of MDESI. Blacks and Asians had lower risk of MDESI than Whites. Living in single-parent household, having less authoritative parents, having negative school experiences further increased adolescents' risk of having MDESI. The predictive model successfully identified 66% of the MDESI cases (recall rate) and accurately identified 72% of the MDESI and MDESI-free cases (accuracy rate) in the training data set. The rates of both recall and accuracy remained about the same (66 and 72%) using the test data. Results from this study confirmed that the logistic model, when used as a classifier, can identify potential cases of MDESI in adolescents with acceptable recall and reasonable accuracy rates. The algorithmic identification of adolescents at risk for depression may improve prevention and intervention.

Download Full-text

A logistic regression model for predicting the occurrence of intense geomagnetic storms

Annales Geophysicae ◽

10.5194/angeo-23-2969-2005 ◽

2005 ◽

Vol 23 (9) ◽

pp. 2969-2974 ◽

Cited By ~ 22

Author(s):

N. Srivastava

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Geomagnetic Storms ◽

Training Data ◽

Validation Dataset ◽

Model Parameters ◽

Validation Data ◽

Data Set ◽

Logistic Regression Models

Abstract. A logistic regression model is implemented for predicting the occurrence of intense/super-intense geomagnetic storms. A binary dependent variable, indicating the occurrence of intense/super-intense geomagnetic storms, is regressed against a series of independent model variables that define a number of solar and interplanetary properties of geo-effective CMEs. The model parameters (regression coefficients) are estimated from a training data set which was extracted from a dataset of 64 geo-effective CMEs observed during 1996-2002. The trained model is validated by predicting the occurrence of geomagnetic storms from a validation dataset, also extracted from the same data set of 64 geo-effective CMEs, recorded during 1996-2002, but not used for training the model. The model predicts 78% of the geomagnetic storms from the validation data set. In addition, the model predicts 85% of the geomagnetic storms from the training data set. These results indicate that logistic regression models can be effectively used for predicting the occurrence of intense geomagnetic storms from a set of solar and interplanetary factors.

Download Full-text

Learning from Imbalanced Multi-label Data Sets by Using Ensemble Strategies

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v4i1.109 ◽

2015 ◽

Vol 4 (1) ◽

pp. 61-81

Author(s):

Mohammad Masoud Javidi

Keyword(s):

Logistic Regression ◽

Ensemble Learning ◽

Nearest Neighbor ◽

Imbalanced Data ◽

Classification Performance ◽

Training Data ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Stable Algorithm

Multi-label classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Problems of this type are ubiquitous in everyday life. Such as, a movie can be categorized as action, crime, and thriller. Most algorithms on multi-label classification learning are designed for balanced data and donâ€™t work well on imbalanced data. On the other hand, in real applications, most datasets are imbalanced. Therefore, we focused to improve multi-label classification performance on imbalanced datasets. In this paper, a state-of-the-art multi-label classification algorithm, which called IBLR_ML, is employed. This algorithm is produced from combination of k-nearest neighbor and logistic regression algorithms. Logistic regression part of this algorithm is combined with two ensemble learning algorithms, Bagging and Boosting. My approach is called IB-ELR. In this paper, for the first time, the ensemble bagging method whit stable learning as the base learner and imbalanced data sets as the training data is examined. Finally, to evaluate the proposed methods; they are implemented in JAVA language. Experimental results show the effectiveness of proposed methods. Keywords: Multi-label classification, Imbalanced data set, Ensemble learning, Stable algorithm, Logistic regression, Bagging, Boosting

Download Full-text

LSO-AdaBoost Based Face Detection for IP-CAM Video

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3543 ◽

2013 ◽

Vol 284-287 ◽

pp. 3543-3548 ◽

Cited By ~ 3

Author(s):

Chuang Jan Chang ◽

Shu Lin Hwang

Keyword(s):

Video Surveillance ◽

Face Detection ◽

Motion Detection ◽

Detection Rate ◽

Detection System ◽

Training Data ◽

Surveillance Systems ◽

High Detection Rate ◽

Data Set ◽

Adaboost Algorithm

The IP-CAM plays a major role in the context of digital video surveillance systems. The function of face detection can add extra value and can contribute towards an intelligent video surveillance system. The cascaded AdaBoost-based face detection system proposed by Viola can support real-time detection with a high detection rate. The performance of the Alt2 cascade (from OpenCV) in an IP-CAM video is worse than that with regard to static images because the training data set in the Alt2 cannot consider the localized characters in the special IP-CAM video. Therefore, this study presents an enhanced training method using the Adaboost algorithm which is capable of obtaining the localized sampling optimum (LSO) from a local IP-CAM video. In addition, we use an improved motion detection algorithm that cooperates with the former face detector to speed up processing time and achieve a better detection rate on video-rate processing speed. The proposed solution has been developed around the cascaded AdaBoost approach, using the open-CV library, with a LSO from a local IP-CAM video. An efficient motion detection model is adopted for practical applications. The overall system performance using 30% local samples can be improved to a 97.9% detection rate and reduce detection time by 54.5% with regard to the Alt2 cascade.

Download Full-text

Prediction Model of Anastomotic Leakage Among Esophageal Cancer Patients After Receiving an Esophagectomy: Machine Learning Approach

JMIR Medical Informatics ◽

10.2196/27110 ◽

2021 ◽

Vol 9 (7) ◽

pp. e27110

Author(s):

Ziran Zhao ◽

Xi Cheng ◽

Xiao Sun ◽

Shanrui Ma ◽

Hao Feng ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Model ◽

Anastomotic Leakage ◽

Cancer Patients ◽

Risk Prediction ◽

Risk Prediction Model ◽

Training Data ◽

Data Set ◽

Selection Operator

Background Anastomotic leakage (AL) is one of the severe postoperative adverse events (5%-30%), and it is related to increased medical costs in cancer patients who undergo esophagectomies. Machine learning (ML) methods show good performance at predicting risk for AL. However, AL risk prediction based on ML models among the Chinese population is unavailable. Objective This study uses ML techniques to develop and validate a risk prediction model to screen patients with emerging AL risk factors. Methods Analyses were performed using medical records from 710 patients who underwent esophagectomies at the National Clinical Research Center for Cancer between January 2010 and May 2015. We randomly split (9:1) the data set into a training data set of 639 patients and a testing data set of 71 patients using a computer algorithm. We assessed multiple classification tools to create a multivariate risk prediction model. Our ML algorithms contained decision tree, random forest, naive Bayes, and logistic regression with least absolute shrinkage and selection operator. The optimal AL prediction model was selected based on model evaluation metrics. Results The final risk panel included 36 independent risk features. Of those, 10 features were significantly identified by the logistic model, including aortic calcification (OR 2.77, 95% CI 1.32-5.81), celiac trunk calcification (OR 2.79, 95% CI 1.20-6.48), forced expiratory volume 1% (OR 0.51, 95% CI 0.30-0.89); TLco (OR 0.56, 95% CI 0.27-1.18), peripheral vascular disease (OR 4.97, 95% CI 1.44-17.07), laparoscope (OR 3.92, 95% CI 1.23-12.51), postoperative length of hospital stay (OR 1.17, 95% CI 1.13-1.21), vascular permeability activity (OR 0.46, 95% CI 0.14-1.48), and fat liquefaction of incisions (OR 4.36, 95% CI 1.86-10.21). Logistic regression with least absolute shrinkage and selection operator offered the highest prediction quality with an area under the receiver operator characteristic of 72% in the training data set. The testing model also achieved similar high performance. Conclusions Our model offered a prediction of AL with high accuracy, assisting in AL prevention and treatment. A personalized ML prediction model with a purely data-driven selection of features is feasible and effective in predicting AL in patients who underwent esophagectomy.

Download Full-text

PREDIKSI TINGKAT PELANGGAN CHURN PADA PERUSAHAAN TELEKOMUNIKASI DENGAN ALGORITMA ADABOOST

Jurnal Informatika ◽

10.30873/ji.v21i1.2867 ◽

2021 ◽

Vol 21 (1) ◽

pp. 34-43

Author(s):

Iqbal Muhammad Latief ◽

Agus Subekti ◽

Windu Gata

Keyword(s):

Data Mining ◽

Customer Service ◽

Secondary Data ◽

Telecommunications Industry ◽

Customer Relationships ◽

Training Data ◽

Churn Prediction ◽

Customer Data ◽

Adaboost Algorithm ◽

Customer Churn

With the rapid advancement of the telecommunications industry, and competition between telecommunications companies is increasing, companies need to predict their customers to determine the level of customer loyalty. One of them is by analyzing customer data by doing a Customer Churn Prediction. Predicting Customer Churn is an important business strategy for the company. To acquire new customers is much higher cost than retaining existing customers. The ease of operator switching is one of the serious challenges that the telecommunications industry must face. By predicting customer churn, companies can take immediate action to retain customers. To retain existing customers, the company must improve customer service, improve product quality, and must know in advance which customers have the possibility to leave the company. Prediction can be done by analyzing customer data using data mining techniques. In line with this, gathering information from the telecommunications business can help predict whether customer relationships will leave the company. The data used in this study are secondary data and amount to 7.403 data customers. The data has 21 variables. This study proposes to use the ensemble method namely adaboost, xgboost and random forest and compare them. Algorithm is validated through training data and testing data with a ratio of 80:20. From the results we got using python tools, it was found that the adaboost algorithm has an accuracy of 80%.Keywords—accuracy, adaboost, churn prediction, compare model, data mining.

Download Full-text

Discrimination of civet coffee using visible spectroscopy

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13734 ◽

2020 ◽

Vol 8 (3) ◽

pp. 239-245

Author(s):

Graciella Mae L Adier ◽

Charlene A Reyes ◽

Edwin R Arboleda

Keyword(s):

Logistic Regression ◽

Training Data ◽

Coffee Bean ◽

Visible Spectroscopy ◽

Data Set ◽

Training Time ◽

Data Mining Algorithms ◽

Visible Spectra ◽

Specialty Coffee ◽

Mining Algorithms

Civet coffee is considered as highly marketable and rare. This specialty coffee has a special flavor and higher price relative to regular coffee, and it is restricted in supply. Establishing a straightforward and efficient approach to distinguish Civet coffee for quality; likewise, consumer protection is fundamental. This study utilized visible spectroscopy as a non-destructive and quick technique to obtain the absorbance, ranging from 450 nm to 650 nm, of the civet coffee and non-civet coffee samples. Overall, 160 samples were analyzed, and the total spectra accumulated was 960. The data gathered from the first 120 samples were fed to the classification learner application and were used as a training data set. The remaining samples were used for testing the classification algorithm. The study shows that civet coffee bean samples have lower absorbance values in visible spectra than non-civet coffee bean samples. The process yields 96.7 % to 100 % classification scores for quadratic discriminant analysis and logistic regression. Among the two classification algorithms, logistic regression generated the fastest training time of 14.050 seconds. The application of visible spectroscopy combined with data mining algorithms is effective in discriminating civet coffee from non-civet coffee.

Download Full-text

TRANSFER LEARNING BASED ON LOGISTIC REGRESSION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-3-w3-145-2015 ◽

2015 ◽

Vol XL-3/W3 ◽

pp. 145-152

Author(s):

A. Paul ◽

F. Rottensteiner ◽

C. Heipke

Keyword(s):

Remote Sensing ◽

Logistic Regression ◽

Transfer Learning ◽

Domain Adaptation ◽

Real Data ◽

Training Data ◽

Model Parameters ◽

Target Domain ◽

Training Set ◽

Data Set

In this paper we address the problem of classification of remote sensing images in the framework of transfer learning with a focus on domain adaptation. The main novel contribution is a method for transductive transfer learning in remote sensing on the basis of logistic regression. Logistic regression is a discriminative probabilistic classifier of low computational complexity, which can deal with multiclass problems. This research area deals with methods that solve problems in which labelled training data sets are assumed to be available only for a source domain, while classification is needed in the target domain with different, yet related characteristics. Classification takes place with a model of weight coefficients for hyperplanes which separate features in the transformed feature space. In term of logistic regression, our domain adaptation method adjusts the model parameters by iterative labelling of the target test data set. These labelled data features are iteratively added to the current training set which, at the beginning, only contains source features and, simultaneously, a number of source features are deleted from the current training set. Experimental results based on a test series with synthetic and real data constitutes a first proof-of-concept of the proposed method.

Download Full-text

Cyber crimes in the banking sector: Case study of Vietnam

International Journal of Social Science and Economics Invention ◽

10.23958/ijssei/vol06-i05/207 ◽

2020 ◽

Vol 6 (05) ◽

Author(s):

LE Thanh Tam ◽

Nguyen Minh Chau ◽

Pham Ngoc Mai ◽

Ngo Ha Phuong ◽

Vu Khanh Huyen Tran

Keyword(s):

Financial Literacy ◽

Commercial Banks ◽

Banking Sector ◽

Secondary Data ◽

Legal Framework ◽

Economic Sectors ◽

Technological Revolution ◽

Survey Results ◽

Cyber Crimes

The technological revolution 4.0 brings great opportunities, but also cybercrimes to economic sectors, especially to banks. Using secondary data and survey results of 305 bank clients, the main findings of this paper are: (i) there are several types of cybercrimes in the banking sector; (ii) Vietnam is one of the top countries worldwide having hackers and being attacked by hackers, especially the banking sector. Three most common attacks are skimming, hacking and phishing. Number of cybercrime attacks in Vietnam are increasing rapidly over years; (iii) Vietnamese customers are very vulnerable to cybercrime in banking, as more than 58% seem to hear about cybercrimes, and how banks provide services to let them know about their transactions. However, more than 50% do not have any deep knowledge or any measures for preventing cybercrime; (iii) Customers believe in banks, but do not think that banks can deal with cybercrime issues well. They still feel traditional transactions are more secure than e-transactions; (iv) the reasons for high cybercrimes come from commercial banks (low management and human capacity), supporting environment (inadequate), legal framework (not yet strong and strict enough on cybercrimes), and clients (low level of financial literacy). Therefore, several solutions should be carried out, from all stakeholders, for improving the cybersecurity in Vietnamese banks.

Download Full-text