Improving Logistic Regression/Credit Scorecards Using Random Forests: Applications with Credit Card and Home Equity Datasets

Author(s):  
Dhruv Sharma

2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.



Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.



Author(s):  
M. A. Al-Shabi

Fraudulent credit card transaction is still one of problems that face the companies and banks sectors; it causes them to lose billions of dollars every year. The design of efficient algorithm is one of the most important challenges in this area. This paper aims to propose an efficient approach that automatic detects fraud credit card related to insurance companies using deep learning algorithm called Autoencoders. The effectiveness of the proposed method has been proved in identifying fraud in actual data from transactions made by credit cards in September 2013 by European cardholders. In addition, a solution for data unbalancing is provided in this paper, which affects most current algorithms. The suggested solution relies on training for the autoencoder for the reconstruction normal data. Anomalies are detected by defining a reconstruction error threshold and considering the cases with a superior threshold as anomalies. The algorithm's performance was able to detected fraudulent transactions between 64% at the threshold = 5, 79% at the threshold = 3 and 91% at threshold= 0.7, it is better in performance compare with logistic regression 57% in unbalanced dataset.



2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18552-e18552
Author(s):  
Syed Hussaini ◽  
Mia Dana ◽  
Lauren Nicholas

e18552 Background: Cancer is the 2nd most common cause of death in the country, eclipsed only by heart disease. Cancer care is increasingly characterized by financial toxicity related to high-cost treatments, though it is unknown whether other chronic conditions impose similar financial harms. Methods: We conducted a retrospective analysis of the Health and Retirement Study participants interviewed between 2012-2018. This is a national, longitudinal survey conducted every two years of adults 50 and older and their spouses. We used fixed effect regression models to compare changes in financial debt among households with new diagnosis of cancer, other major chronic conditions (diabetes, stroke, or heart disease), and no new health diagnosis (or health shock). Since more affluent households may respond to health shocks differently, we estimated separate comparisons for households above versus below median wealth in 2012, prior to new health conditions. We assessed use of any non-housing financial debt, credit card debt, and home equity lines of credit among the subset of homeowning households. Results: In this study of 14,153 households, average age at interview was 62 years, with 43% male, 70% White, 22% Black, 13% Hispanic, and 70% with up to high school education. Of this population, 25% held credit card debt, 70% owned a home, 18% had a home equity line of credit, and 9% used a home equity line of credit. Among households with below median wealth when they entered the study in 2012 ( < $23,000 in $2016), a new cancer diagnosis was associated with a 4.7 percentage point increase in financial debt (12.5% effect size, p < 0.05). Participants diagnosed with a chronic condition (heart condition, stroke or diabetes) were 3.6 percentage points more likely to develop financial debt (9.6%, p < 0.05) compared to households that did not develop a new chronic condition. Such differences were eliminated in participants in a house with above median wealth. There was no difference in credit card debt, availability of home equity line of credit, or use of home equity line of credit for participants with a new diagnosis. Conclusions: New diagnosis of cancer or a chronic condition were associated with increased financial debt for older Americans living in a household that were below median wealth.



Author(s):  
Wei Mingjun ◽  
Chai Lei ◽  
Wei Renying ◽  
Huo Wang

Our team has won the Grand Champion (Tie) of PAKDD-2007 data mining competition. The data mining task is to score credit card customers of a consumer finance company according to the likelihood that customers take up the home loans offered by the company. This report presents our solution for this business problem. TreeNet and logistic regression are the data mining algorithms used in this project. The final score is based on the cross-algorithm ensemble of two within-algorithm ensembles of TreeNet and logistic regression. Finally, some discussions from our solution are presented.



2019 ◽  
Vol 7 (4) ◽  
pp. 309-317
Author(s):  
Sarita Gupta ◽  
Dr. Sanjay Kumar

Purpose: In the regime of stretched old-age social security, federals and policymakers are presuming housing wealth as a means of sustainable livelihood for elderly homeowners.  The current study attempts to discover which demographic and financial factors are significant determinants of home equity liquidation through reverse mortgage of Indians in later life. Methodology: Binary logistic regression is applied to survey-based primary data of 410 elderly homeowners through SPSS software. Main Findings: Results of binary logistic regression model depicts that elderly considering an RM likely to be female, older, having poor health, childless or having girl child only, long life expectancy, resident of metro, employed, cash-constrained, not having any kind of insurance cover and those children are financially well are significantly more willing to opt for RM scheme. Implication: Study renders implications for Government and NHB, to provide refinancing facility to commercial banks so that home equity liquidation product like Reverse Mortgage can be able to fulfill income needs of greying India. Novelty/Originality: Length of research in European and western countries have been carried out to explore the attitude of older homeowner for housing wealth liquidation but Indian context, is largely untapped that how Indian older homeowner perceive their housing wealth and which factor influences them to delete it. In this way, current study attempts to bridge the research gap.



10.2196/15601 ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. e15601 ◽  
Author(s):  
Quazi Abidur Rahman ◽  
Tahir Janmohamed ◽  
Hance Clarke ◽  
Paul Ritvo ◽  
Jane Heffernan ◽  
...  

Background Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue. Objective This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue. Methods A total of 132 features were extracted from the first month of app use to develop machine learning–based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.



2021 ◽  
Vol 11 (19) ◽  
pp. 8977
Author(s):  
Wook-Yeon Hwang ◽  
Jong-Seok Lee

Two-way cooperative collaborative filtering (CF) has been known to be crucial for binary market basket data. We propose an improved two-way logistic regression approach, a Pearson correlation-based score, a random forests (RF) R-square-based score, an RF Pearson correlation-based score, and a CF scheme based on the RF R-square-based score. The main idea is to utilize as much predictive information as possible within the two-way prediction in order to cope with the cold-start problem. All of the proposed methods work better than the existing two-way cooperative CF approach in terms of the experimental results.



Sign in / Sign up

Export Citation Format

Share Document