Fixed effects in rare events data: a penalized maximum likelihood solution

2018 ◽  
Vol 8 (1) ◽  
pp. 92-105 ◽  
Author(s):  
Scott J. Cook ◽  
Jude C. Hays ◽  
Robert J. Franzese

AbstractMost agree that models of binary time-series-cross-sectional data in political science often possess unobserved unit-level heterogeneity. Despite this, there is no clear consensus on how best to account for these potential unit effects, with many of the issues confronted seemingly misunderstood. For example, one oft-discussed concern with rare events data is the elimination of no-event units from the sample when estimating fixed effects models. Many argue that this is a reason to eschew fixed effects in favor of pooled or random effects models. We revisit this issue and clarify that the main concern with fixed effects models of rare events data is not inaccurate or inefficient coefficient estimation, but instead biased marginal effects. In short, only evaluating event-experiencing units gives an inaccurate estimate of the baseline risk, yielding inaccurate (often inflated) estimates of predictor effects. As a solution, we propose a penalized maximum likelihood fixed effects (PML-FE) estimator, which retains the complete sample by providing finite estimates of the fixed effects for each unit. We explore the small sample performance of PML-FE versus common alternatives via Monte Carlo simulations, evaluating the accuracy of both parameter and effects estimates. Finally, we illustrate our method with a model of civil war onset.

2019 ◽  
Vol 63 (3) ◽  
pp. 357-369 ◽  
Author(s):  
Terrence D. Hill ◽  
Andrew P. Davis ◽  
J. Micah Roos ◽  
Michael T. French

Although fixed-effects models for panel data are now widely recognized as powerful tools for longitudinal data analysis, the limitations of these models are not well known. We provide a critical discussion of 12 limitations, including a culture of omission, low statistical power, limited external validity, restricted time periods, measurement error, time invariance, undefined variables, unobserved heterogeneity, erroneous causal inferences, imprecise interpretations of coefficients, imprudent comparisons with cross-sectional models, and questionable contributions vis-à-vis previous work. Instead of discouraging the use of fixed-effects models, we encourage more critical applications of this rigorous and promising methodology. The most important deficiencies—Type II errors, biased coefficients and imprecise standard errors, misleading p values, misguided causal claims, and various theoretical concerns—should be weighed against the likely presence of unobserved heterogeneity in other regression models. Ultimately, we must do a better job of communicating the pitfalls of fixed-effects models to our colleagues and students.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


Biometrics ◽  
1997 ◽  
Vol 53 (3) ◽  
pp. 983 ◽  
Author(s):  
Michael G. Kenward ◽  
James H. Roger

2021 ◽  
pp. 008117502110160
Author(s):  
Scott W. Duxbury

Panel data analysis is common in the social sciences. Fixed effects models are a favorite among sociologists because they control for unobserved heterogeneity (unexplained variation) among cross-sectional units, but estimates are biased when there is unobserved heterogeneity in the underlying time trends. Two-way fixed effects models adjust for unobserved time heterogeneity but are inefficient, cannot include unit-invariant variables, and eliminate common trends: the portion of variance in a time-varying variable that is invariant across cross-sectional units. This article introduces a general panel model that can include unit-invariant variables, corrects for unobserved time heterogeneity, and provides the effect of common trends while also allowing for unobserved unit heterogeneity, time-varying coefficients, and time-invariant variables. One-way and two-way fixed effects models are shown to be restrictive forms of this general model. Other restrictive forms are also derived that offer all the usual advantages of one-way and two-way fixed effects models but account for unobserved time heterogeneity. The author uses the models to examine the increase in state incarceration rates between 1970 and 2015.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 788
Author(s):  
Marcin Fałdziński ◽  
Magdalena Osińska ◽  
Wojciech Zalewski

This paper uses the Extreme Value Theory (EVT) to model the rare events that appear as delivery delays in road transport. Transport delivery delays occur stochastically. Therefore, modeling such events should be done using appropriate tools due to the economic consequences of these extreme events. Additionally, we provide the estimates of the extremal index and the return level with the confidence interval to describe the clustering behavior of rare events in deliveries. The Generalized Extreme Value Distribution (GEV) parameters are estimated using the maximum likelihood method and the penalized maximum likelihood method for better small-sample properties. The findings demonstrate the advantages of EVT-based prediction and its readiness for application.


Author(s):  
Liam F. Beiser-McGrath

Abstract When separation is a problem in binary dependent variable models, many researchers use Firth's penalized maximum likelihood in order to obtain finite estimates (Firth, 1993; Zorn, 2005; Rainey, 2016). In this paper, I show that this approach can lead to inferences in the opposite direction of the separation when the number of observations are sufficiently large and both the dependent and independent variables are rare events. As large datasets with rare events are frequently used in political science, such as dyadic data measuring interstate relations, a lack of awareness of this problem may lead to inferential issues. Simulations and an empirical illustration show that the use of independent “weakly-informative” prior distributions centered at zero, for example, the Cauchy prior suggested by Gelman et al. (2008), can avoid this issue. More generally, the results caution researchers to be aware of how the choice of prior interacts with the structure of their data, when estimating models in the presence of separation.


2020 ◽  
Author(s):  
Sonia Zaharia ◽  
Shibani Ghosh ◽  
Robin Shrestha ◽  
Swetha Manohar ◽  
Andrew Thorne-Lyman ◽  
...  

Abstract In resource constrained countries, animal-sourced foods (ASFs) are an important nutrient-dense source of vitamins, minerals and macronutrients. While several studies have suggested the value of ASFs to child growth, most empirical evidence is based on cross-sectional data which can only provide information about the contemporaneous relationship between diet and anthropometric outcomes. This study uses longitudinal panel data for Nepal, Bangladesh, and Uganda to assess the association between contemporaneous as well as past ASF consumption and linear growth of children aged 6-24 months. Fixed effects models found that ASF consumption was significantly correlated with lower stunting, with a decline in stunting prevalence as high as 10% in Nepali children who had consumed any ASF in the previous year. Consuming two or more ASFs showed an even higher magnitude of association, ranging from a 10% decline in prevalence of stunting associated with lagged consumption in Bangladesh to a 16% decline in Nepal.


Sign in / Sign up

Export Citation Format

Share Document