Dealing with Separation in Logistic Regression Models

2016 ◽  
Vol 24 (3) ◽  
pp. 339-355 ◽  
Author(s):  
Carlisle Rainey

When facing small numbers of observations or rare events, political scientists often encounter separation, in which explanatory variables perfectly predict binary events or nonevents. In this situation, maximum likelihood provides implausible estimates and the researcher might want incorporate some form of prior information into the model. The most sophisticated research uses Jeffreys’ invariant prior to stabilize the estimates. While Jeffreys’ prior has the advantage of being automatic, I show that it often provides too much prior information, producing smaller point estimates and narrower confidence intervals than even highly skeptical priors. To help researchers assess the amount of information injected by the prior distribution, I introduce the concept of a partial prior distribution and develop the tools required to compute the partial prior distribution of quantities of interest, estimate the subsequent model, and summarize the results.

Author(s):  
Yong Peng ◽  
Shuangling Peng ◽  
Xinghua Wang ◽  
Shiyang Tan

This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha–Zhuzhou–Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle–fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle–fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle–fixed object accidents.


2018 ◽  
Vol 49 (2) ◽  
pp. 498-525 ◽  
Author(s):  
Jouni Kuha ◽  
Colin Mills

It is widely believed that regression models for binary responses are problematic if we want to compare estimated coefficients from models for different groups or with different explanatory variables. This concern has two forms. The first arises if the binary model is treated as an estimate of a model for an unobserved continuous response and the second when models are compared between groups that have different distributions of other causes of the binary response. We argue that these concerns are usually misplaced. The first of them is only relevant if the unobserved continuous response is really the subject of substantive interest. If it is, the problem should be addressed through better measurement of this response. The second concern refers to a situation which is unavoidable but unproblematic, in that causal effects and descriptive associations are inherently group dependent and can be compared as long as they are correctly estimated.


2017 ◽  
Author(s):  
Jouni Kuha ◽  
Colin Mills

It is widely believed that regression models for binary responses are problematic if we want to compare estimated coefficients from models for different groups or with different explanatory variables. This concern has two forms. The first arises if the binary model is treated as an estimate of a model for an unobserved continuous response, and the second when models are compared between groups which have different distributions of other causes of the binary response. We argue that these concerns are usually misplaced. The first of them is only relevant if the unobserved continuous response is really the subject of substantive interest. If it is, the problem should be addressed through better measurement of this response. The second concern refers to a situation which is unavoidable but unproblematic, in that causal effects and descriptive associations are inherently group-dependent and can be compared as long as they are correctly estimated.


2012 ◽  
Vol 21 (1) ◽  
pp. 1 ◽  
Author(s):  
Travis Woolley ◽  
David C. Shaw ◽  
Lisa M. Ganio ◽  
Stephen Fitzgerald

Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed burns and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate and interpret logistic regression models; explanatory variables in logistic regression models; factors influencing scope of inference and model limitations; model validation; and management applications. Logistic regression is currently the most widely used and available technique for predicting post-fire tree mortality. Over 100 logistic regression models have been developed to predict post-fire tree mortality for 19 coniferous species following wild and prescribed fires. The most widely used explanatory variables in post-fire tree mortality logistic regression models have been measurements of crown (e.g. crown scorch) and stem (e.g. bole char) injury. Prediction of post-fire tree mortality improves when crown and stem variables are used collectively. Logistic regression models that predict post-fire tree mortality are the basis of simple field tools and contribute to larger fire-effects models. Future post-fire tree mortality prediction models should include consistent definition of model variables, model validation and direct incorporation of physiological responses that link to process modelling efforts.


Sign in / Sign up

Export Citation Format

Share Document