scholarly journals Modelling Interaction Effects by Using Extended WOE Variables with Applications to Credit Scoring

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1903
Author(s):  
Carlos Giner-Baixauli ◽  
Juan Tinguaro Rodríguez ◽  
Alejandro Álvaro-Meca ◽  
Daniel Vélez

The term credit scoring refers to the application of formal statistical tools to support or automate loan-issuing decision-making processes. One of the most extended methodologies for credit scoring include fitting logistic regression models by using WOE explanatory variables, which are obtained through the discretization of the original inputs by means of classification trees. However, this Weight of Evidence (WOE)-based methodology encounters some difficulties in order to model interactions between explanatory variables. In this paper, an extension of the WOE-based methodology for credit scoring is proposed that allows constructing a new kind of WOE variable devised to capture interaction effects. Particularly, these new WOE variables are obtained through the simultaneous discretization of pairs of explanatory variables in a single classification tree. Moreover, the proposed extension of the WOE-based methodology can be complemented as usual by balance scorecards, which enable explaining why individual loans are granted or not granted from the fitted logistic models. Such explainability of loan decisions is essential for credit scoring and even more so by taking into account the recent law developments, e.g., the European Union’s GDPR. An extensive computational study shows the feasibility of the proposed approach that also enables the improvement of the predicitve capability of the standard WOE-based methodology.

Author(s):  
Morten W. Fagerland ◽  
David W. Hosmer

Ordinal regression models are used to describe the relationship between an ordered categorical response variable and one or more explanatory variables. Several ordinal logistic models are available in Stata, such as the proportional odds, adjacent-category, and constrained continuation-ratio models. In this article, we present a command (ologitgof) that calculates four goodness-of-fit tests for assessing the overall adequacy of these models. These tests include an ordinal version of the Hosmer–Lemeshow test, the Pulkstenis–Robinson chi-squared and deviance tests, and the Lipsitz likelihood-ratio test. Together, these tests can detect several different types of lack of fit, including wrongly specified continuous terms, omission of different types of interaction terms, and an unordered response variable.


2009 ◽  
Vol 48 (03) ◽  
pp. 306-310 ◽  
Author(s):  
C. E. Minder ◽  
G. Gillmann

Summary Objectives: This paper is concerned with checking goodness-of-fit of binary logistic regression models. For the practitioners of data analysis, the broad classes of procedures for checking goodness-of-fit available in the literature are described. The challenges of model checking in the context of binary logistic regression are reviewed. As a viable solution, a simple graphical procedure for checking goodness-of-fit is proposed. Methods: The graphical procedure proposed relies on pieces of information available from any logistic analysis; the focus is on combining and presenting these in an informative way. Results: The information gained using this approach is presented with three examples. In the discussion, the proposed method is put into context and compared with other graphical procedures for checking goodness-of-fit of binary logistic models available in the literature. Conclusion: A simple graphical method can significantly improve the understanding of any logistic regression analysis and help to prevent faulty conclusions.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Gulsah Gurkan ◽  
Yoav Benjamini ◽  
Henry Braun

AbstractEmploying nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.


2020 ◽  
Vol 7 (10) ◽  
pp. 150-160
Author(s):  
Fabrício Pelizer Almeida ◽  
Moisés Keniel Guilherme de Lima ◽  
Demóstenes Coutinho Gomes ◽  
Esther Ferreira de Souza

Author(s):  
Yong Peng ◽  
Shuangling Peng ◽  
Xinghua Wang ◽  
Shiyang Tan

This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha–Zhuzhou–Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle–fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle–fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle–fixed object accidents.


Author(s):  
Ghazal Aarabi ◽  
Richelle Valdez ◽  
Kristin Spinler ◽  
Carolin Walther ◽  
Udo Seedorf ◽  
...  

High costs are an important reason patients postpone dental visits, which can lead to serious medical consequences. However, little is known about the determinants of postponing visits due to financial constraints longitudinally. Thus, the purpose of this study was to examine the determinants of postponing dental visits due to costs in older adults in Germany longitudinally. Data from wave 5 and 6 of the Survey of Health, Ageing, and Retirement in Europe was used. The occurrence of postponed dental visits due to costs in the last 12 months served as the outcome measure. Socioeconomic and health-related explanatory variables were included. Conditional fixed effects logistic regression models were used (n = 362). Regressions showed that the likelihood of postponing dental visits due to costs increased with lower age, less chronic disease, and lower income. The outcome measure was neither associated with marital status nor self-rated health. Identifying the factors associated with postponed dental visits due to costs might help to mitigate this challenge. In the long term, this might help to maintain the well-being of older individuals.


2009 ◽  
Vol 39 (11) ◽  
pp. 2224-2233 ◽  
Author(s):  
Tristan D. Huff ◽  
John D. Bailey

Worldwide, snags are an important, but often lacking, component of forest ecosystems. We revisited artificially topped Douglas-fir ( Pseudotsuga menziesii (Mirb.) Franco) trees 16–18 years after treatment in a replicated experiment in western Oregon. Some trees had been topped such that no live crown was retained (fatally topped), while others retained some portion of their live crown after topping (nonfatally topped). Topped trees were created under three different silvicultural regimes: clearcut, two story, and group selection. Twenty-three percent (61 of 262) of nonfatally topped trees remained living 16–18 years after treatment; 4% (19 of 482) of fatally topped trees had broken at some point up the bole by 16–18 years after treatment. Silvicultural regime, post-treatment height, stem diameter, stem lean, and ground slope were considered as potential explanatory variables in logistic regression models explaining mortality and breakage. A nonfatally topped tree’s odds of surviving 16–18 years after treatment was greater in the mature matrix of group selection stands than in clearcuts or two-story stands. A fatally topped tree’s odds of breaking within 16–18 years of treatment decreased as DBH increased. If carefully created, artificially topping trees can be a useful silvicultural tool to increase structural heterogeneity.


2002 ◽  
Vol 32 (1) ◽  
pp. 219-245 ◽  
Author(s):  
Kazuo Yamaguchi

This paper describes linear regression models with parametrically weighted explanatory variables and related logistic regression models that estimate parameters characterizing (1) the effects of weighted variables on the dependent variable and (2) weights for the components of weighted variables. The models also characterize parsimoniously the interaction effects between weighted variables and covariates on the dependent variable by the use of various constraints on parameters. In particular, the models are concerned with testing the significance of variation with covariates in the weights of weighted variables separately from the significance of variation with those covariates in the effects of weighted variables. The usefulness of these models in sociological research is demonstrated by an illustrative analysis of the class identifications of married working women using education, occupational prestige, and income as three variables weighted between own and spousal attributes, and using year, age, race, part-time–full-time distinction, and employment status as covariates.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
G Migliara ◽  
V Baccolini ◽  
L M Salvatori ◽  
A Angelozzi ◽  
C Isonne ◽  
...  

Abstract Background Healthcare associated Infections (HAIs) represent a significant burden in terms of mortality, morbidity, length of stay and costs for patients in intensive care units (ICUs). In this study, we analyzed the predictors of HAIs development and assessed the HAIs association with mortality. Data were retrieved from a general ICU active surveillance system of a large teaching hospital in Rome. Methods Logistic regression models were built to quantify the association between demographic and clinical factors and the development of HAIs, device-related HAIs and Multi Drug Resistant (MDR)-associated HAIs. The HAIs independent predictors were used to create propensity scores (PS) specific for each model, that was subsequently used to adjust the association between these conditions and mortality in logistic regression models. Results From May 2016 to September 2019, 864 patients were included in the surveillance system, 236 (27.3%) of which had at least one HAI during their hospitalization. Specifically, 162 (18.8%) patients had at least a device-related HAI and the overall mortality rate was 34.3%. Factors associated with the HAIs and the device-related HAIs were mechanical ventilation and admission for trauma. The PS-adjusted logistic models showed an association between HAI and device-related HAI and mortality (OR 1.82, 95%CI 1.30-2.54; OR 2.03, 95%CI 1.40-2.95, respectively). MDR-associated HAIs had a significant association with diabetes mellitus; however, these infections weren't associated with mortality (OR 1.42, 95%CI 0.98-2.08), even in the subgroup of infected patients (OR 0.99, 95%CI 0.56-1.73). Conclusions The study confirms the association between HAIs and device-related HAIs with mortality in ICUs. Apparently, MDR-associated infection subset appears not having a specific association with mortality. However, given the extra effort that these infections require to be managed, they should be adequately surveilled and contrasted. Key messages Healthcare associated infections are strongly associated with mortality in ICU. MDR-associated infections do not seem to give a specific drawback in our setting.


Sign in / Sign up

Export Citation Format

Share Document