Fixed effects in rare events data: a penalized maximum likelihood solution

AbstractMost agree that models of binary time-series-cross-sectional data in political science often possess unobserved unit-level heterogeneity. Despite this, there is no clear consensus on how best to account for these potential unit effects, with many of the issues confronted seemingly misunderstood. For example, one oft-discussed concern with rare events data is the elimination of no-event units from the sample when estimating fixed effects models. Many argue that this is a reason to eschew fixed effects in favor of pooled or random effects models. We revisit this issue and clarify that the main concern with fixed effects models of rare events data is not inaccurate or inefficient coefficient estimation, but instead biased marginal effects. In short, only evaluating event-experiencing units gives an inaccurate estimate of the baseline risk, yielding inaccurate (often inflated) estimates of predictor effects. As a solution, we propose a penalized maximum likelihood fixed effects (PML-FE) estimator, which retains the complete sample by providing finite estimates of the fixed effects for each unit. We explore the small sample performance of PML-FE versus common alternatives via Monte Carlo simulations, evaluating the accuracy of both parameter and effects estimates. Finally, we illustrate our method with a model of civil war onset.

Download Full-text

Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models

Journal of Business and Economic Statistics ◽

10.1080/07350015.2016.1247004 ◽

2017 ◽

Vol 36 (4) ◽

pp. 672-683 ◽

Cited By ~ 23

Author(s):

James E. Pustejovsky ◽

Elizabeth Tipton

Keyword(s):

Hypothesis Testing ◽

Fixed Effects ◽

Variance Estimation ◽

Small Sample ◽

Robust Variance ◽

Fixed Effects Models ◽

Robust Variance Estimation

Download Full-text

Limitations of Fixed-Effects Models for Panel Data

Sociological Perspectives ◽

10.1177/0731121419863785 ◽

2019 ◽

Vol 63 (3) ◽

pp. 357-369 ◽

Cited By ~ 16

Author(s):

Terrence D. Hill ◽

Andrew P. Davis ◽

J. Micah Roos ◽

Michael T. French

Keyword(s):

Panel Data ◽

Fixed Effects ◽

Statistical Power ◽

Unobserved Heterogeneity ◽

Critical Discussion ◽

Causal Inferences ◽

Cross Sectional ◽

Fixed Effects Models ◽

Time Invariance ◽

Type Ii Errors

Although fixed-effects models for panel data are now widely recognized as powerful tools for longitudinal data analysis, the limitations of these models are not well known. We provide a critical discussion of 12 limitations, including a culture of omission, low statistical power, limited external validity, restricted time periods, measurement error, time invariance, undefined variables, unobserved heterogeneity, erroneous causal inferences, imprecise interpretations of coefficients, imprudent comparisons with cross-sectional models, and questionable contributions vis-à-vis previous work. Instead of discouraging the use of fixed-effects models, we encourage more critical applications of this rigorous and promising methodology. The most important deficiencies—Type II errors, biased coefficients and imprecise standard errors, misleading p values, misguided causal claims, and various theoretical concerns—should be weighed against the likely presence of unobserved heterogeneity in other regression models. Ultimately, we must do a better job of communicating the pitfalls of fixed-effects models to our colleagues and students.

Download Full-text

A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data

BioMed Research International ◽

10.1155/2020/1632350 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Marjan Faghih ◽

Zahra Bagheri ◽

Dejan Stevanovic ◽

Seyyed Mohhamad Taghi Ayatollahi ◽

Peyman Jafari

Keyword(s):

Logistic Regression ◽

Maximum Likelihood ◽

Sample Size ◽

Differential Item Functioning ◽

Bias Correction ◽

Rare Events ◽

Estimation Methods ◽

Type I ◽

Item Functioning ◽

Events Data

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.

Download Full-text

Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Social Science Research ◽

10.1016/j.ssresearch.2020.102486 ◽

2021 ◽

Vol 93 ◽

pp. 102486

Author(s):

Joan C. Timoneda

Keyword(s):

Logistic Regression ◽

Panel Data ◽

Fixed Effects ◽

Rare Events ◽

Binary Dependent Variable ◽

Events Data

Download Full-text

Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood

Biometrics ◽

10.2307/2533558 ◽

1997 ◽

Vol 53 (3) ◽

pp. 983 ◽

Cited By ~ 2535

Author(s):

Michael G. Kenward ◽

James H. Roger

Keyword(s):

Maximum Likelihood ◽

Fixed Effects ◽

Restricted Maximum Likelihood ◽

Small Sample

Download Full-text

A General Panel Model for Unobserved Time Heterogeneity with Application to the Politics of Mass Incarceration

Sociological Methodology ◽

10.1177/00811750211016033 ◽

2021 ◽

pp. 008117502110160

Author(s):

Scott W. Duxbury

Keyword(s):

Fixed Effects ◽

Unobserved Heterogeneity ◽

Panel Data Analysis ◽

Time Varying ◽

Cross Sectional ◽

Common Trends ◽

Varying Coefficients ◽

Panel Model ◽

Fixed Effects Models ◽

Time Varying Coefficients

Panel data analysis is common in the social sciences. Fixed effects models are a favorite among sociologists because they control for unobserved heterogeneity (unexplained variation) among cross-sectional units, but estimates are biased when there is unobserved heterogeneity in the underlying time trends. Two-way fixed effects models adjust for unobserved time heterogeneity but are inefficient, cannot include unit-invariant variables, and eliminate common trends: the portion of variance in a time-varying variable that is invariant across cross-sectional units. This article introduces a general panel model that can include unit-invariant variables, corrects for unobserved time heterogeneity, and provides the effect of common trends while also allowing for unobserved unit heterogeneity, time-varying coefficients, and time-invariant variables. One-way and two-way fixed effects models are shown to be restrictive forms of this general model. Other restrictive forms are also derived that offer all the usual advantages of one-way and two-way fixed effects models but account for unobserved time heterogeneity. The author uses the models to examine the increase in state incarceration rates between 1970 and 2015.

Download Full-text

Extreme Value Theory in Application to Delivery Delays

Entropy ◽

10.3390/e23070788 ◽

2021 ◽

Vol 23 (7) ◽

pp. 788

Author(s):

Marcin Fałdziński ◽

Magdalena Osińska ◽

Wojciech Zalewski

Keyword(s):

Maximum Likelihood ◽

Extreme Value Theory ◽

Maximum Likelihood Method ◽

Rare Events ◽

Value Theory ◽

Extreme Value ◽

Small Sample ◽

Generalized Extreme Value Distribution ◽

Return Level ◽

Likelihood Method

This paper uses the Extreme Value Theory (EVT) to model the rare events that appear as delivery delays in road transport. Transport delivery delays occur stochastically. Therefore, modeling such events should be done using appropriate tools due to the economic consequences of these extreme events. Additionally, we provide the estimates of the extremal index and the return level with the confidence interval to describe the clustering behavior of rare events in deliveries. The Generalized Extreme Value Distribution (GEV) parameters are estimated using the maximum likelihood method and the penalized maximum likelihood method for better small-sample properties. The findings demonstrate the advantages of EVT-based prediction and its readiness for application.

Download Full-text

Separation and Rare Events

Political Science Research and Methods ◽

10.1017/psrm.2020.46 ◽

2020 ◽

pp. 1-10

Author(s):

Liam F. Beiser-McGrath

Keyword(s):

Maximum Likelihood ◽

Rare Events ◽

Large Datasets ◽

Dyadic Data ◽

Interstate Relations ◽

Penalized Maximum Likelihood ◽

Independent Variables ◽

Binary Dependent Variable ◽

Informative Prior Distributions ◽

Empirical Illustration

Abstract When separation is a problem in binary dependent variable models, many researchers use Firth's penalized maximum likelihood in order to obtain finite estimates (Firth, 1993; Zorn, 2005; Rainey, 2016). In this paper, I show that this approach can lead to inferences in the opposite direction of the separation when the number of observations are sufficiently large and both the dependent and independent variables are rare events. As large datasets with rare events are frequently used in political science, such as dyadic data measuring interstate relations, a lack of awareness of this problem may lead to inferential issues. Simulations and an empirical illustration show that the use of independent “weakly-informative” prior distributions centered at zero, for example, the Cauchy prior suggested by Gelman et al. (2008), can avoid this issue. More generally, the results caution researchers to be aware of how the choice of prior interacts with the structure of their data, when estimating models in the presence of separation.

Download Full-text

The association between individual-level social capital and health: cross-sectional, prospective cohort and fixed-effects models

Journal of Epidemiology & Community Health ◽

10.1136/jech-2015-205962 ◽

2015 ◽

Vol 70 (1) ◽

pp. 25-30 ◽

Cited By ~ 14

Author(s):

Takashi Oshio

Keyword(s):

Social Capital ◽

Prospective Cohort ◽

Fixed Effects ◽

Cross Sectional ◽

Individual Level ◽

Fixed Effects Models

Download Full-text

Young Children Who Eat Animal Sourced Foods Grow Less Stunted: Findings of Contemporaneous and Lagged Analyses from Nepal, Uganda and Bangladesh

10.21203/rs.3.rs-74484/v1 ◽

2020 ◽

Author(s):

Sonia Zaharia ◽

Shibani Ghosh ◽

Robin Shrestha ◽

Swetha Manohar ◽

Andrew Thorne-Lyman ◽

...

Keyword(s):

Panel Data ◽

Young Children ◽

Empirical Evidence ◽

Fixed Effects ◽

Linear Growth ◽

Child Growth ◽

Resource Constrained ◽

Cross Sectional ◽

Fixed Effects Models

Abstract In resource constrained countries, animal-sourced foods (ASFs) are an important nutrient-dense source of vitamins, minerals and macronutrients. While several studies have suggested the value of ASFs to child growth, most empirical evidence is based on cross-sectional data which can only provide information about the contemporaneous relationship between diet and anthropometric outcomes. This study uses longitudinal panel data for Nepal, Bangladesh, and Uganda to assess the association between contemporaneous as well as past ASF consumption and linear growth of children aged 6-24 months. Fixed effects models found that ASF consumption was significantly correlated with lower stunting, with a decline in stunting prevalence as high as 10% in Nepali children who had consumed any ASF in the previous year. Consuming two or more ASFs showed an even higher magnitude of association, ranging from a 10% decline in prevalence of stunting associated with lagged consumption in Bangladesh to a 16% decline in Nepal.

Download Full-text