scholarly journals Logistic Regression in Rare Events Data

2003 ◽  
Vol 8 (2) ◽  
Author(s):  
Gary King ◽  
Langche Zeng
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


2001 ◽  
Vol 9 (2) ◽  
pp. 137-163 ◽  
Author(s):  
Gary King ◽  
Langche Zeng

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.


2018 ◽  
Author(s):  
Itzel Coca Rios

Between 2012 and 2014, there were ten events in Mexico City that were repressed through arbitrary arrests which affected 365 persons. Through data analysis about the protest in that period it’s verified a change in police strategy by means of more selective tactics of repression and protest disarticulation. A sample of massive demonstrations with more than 2 thousand assistants was taken to test the hypothesis of repression as a response to two main characteristics of the events: 1) a protest directed to the federal scope, that local government cannot negotiate with, and 2) that threatens public order and status quo through: violence, several claims directed to many authorities, and radical petitions. The binomial logistic regression with “rare events” package and QCA tests reveal that the federal scope of the claim and the presence of violence from the protestors are necessary conditions for the repression to occur, while radicalism and variety of claims receive partial support. The study concludes with a nested analysis of the cases of December 1st 2012 and 2013.


2015 ◽  
Vol 34 (3) ◽  
pp. 230-239 ◽  
Author(s):  
Raffaella Calabrese ◽  
Silvia Angela Osmetti
Keyword(s):  

Author(s):  
Rainer Puhr ◽  
Georg Heinze ◽  
Mariana Nold ◽  
Lara Lusa ◽  
Angelika Geroldinger

Sign in / Sign up

Export Citation Format

Share Document