How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It

“Robust standard errors” are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use “generalized information matrix test” statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article is software that implements the methods we describe.

Download Full-text

Asymptotic Expansions of the Information Matrix Test Statistic

Econometrica ◽

10.2307/2938228 ◽

1991 ◽

Vol 59 (3) ◽

pp. 787 ◽

Cited By ~ 61

Author(s):

Andrew Chesher ◽

Richard Spady

Keyword(s):

Asymptotic Expansions ◽

Information Matrix ◽

Test Statistic ◽

Information Matrix Test

Download Full-text

An Information Matrix Test for the Collapsing of Categories Under the Partial Credit Model

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998618787478 ◽

2018 ◽

Vol 43 (6) ◽

pp. 721-750

Author(s):

Daphna Harel ◽

Russell J. Steele

Keyword(s):

Goodness Of Fit ◽

Model Misspecification ◽

Information Matrix ◽

Partial Credit Model ◽

Partial Credit ◽

Goodness Of Fit Test ◽

Data Set ◽

Data Reduction Technique ◽

And Performance ◽

Information Matrix Test

Collapsing categories is a commonly used data reduction technique; however, to date there do not exist principled methods to determine whether collapsing categories is appropriate in practice. With ordinal responses under the partial credit model, when collapsing categories, the true model for the collapsed data is no longer a partial credit model, and therefore refitting a partial credit model may result in model misspecification. This article details the implementation and performance of an information matrix test (IMT) to assess the implications of collapsing categories for a given data set under the partial credit model and compares its performance to the application of a nominal response model (NRM) and the S − X2 goodness-of-fit statistic. The IMT and NRM-based test are able to correctly determine the true number of categories for an item, given reasonable power through this goodness-of-fit test. We conclude by applying the test to a well-studied data set from the literature.

Download Full-text

Assessing Person Fit With the Information Matrix Test

Methodology ◽

10.1027/1614-2241/a000085 ◽

2015 ◽

Vol 11 (1) ◽

pp. 3-12 ◽

Cited By ~ 2

Author(s):

Jochen Ranger ◽

Jörg-Tobias Kuhn

Keyword(s):

Simulation Study ◽

Type I Error ◽

Information Matrix ◽

Small Samples ◽

Type I ◽

Person Fit ◽

Power Of The Test ◽

Order Expansion ◽

Trait Stability ◽

Information Matrix Test

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.

Download Full-text

Understanding, Choosing, and Unifying Multilevel and Fixed Effect Approaches

Political Analysis ◽

10.1017/pan.2020.41 ◽

2020 ◽

pp. 1-20

Author(s):

Chad Hazlett ◽

Leonard Wainstein

Keyword(s):

Random Effects ◽

Fixed Effects ◽

Multilevel Models ◽

Standard Errors ◽

Point Estimate ◽

Fe Model ◽

Grouped Data ◽

Coefficient Estimates ◽

Political Science Education ◽

Robust Standard Errors

Abstract When working with grouped data, investigators may choose between “fixed effects” models (FE) with specialized (e.g., cluster-robust) standard errors, or “multilevel models” (MLMs) employing “random effects.” We review the claims given in published works regarding this choice, then clarify how these approaches work and compare by showing that: (i) random effects employed in MLMs are simply “regularized” fixed effects; (ii) unmodified MLMs are consequently susceptible to bias—but there is a longstanding remedy; and (iii) the “default” MLM standard errors rely on narrow assumptions that can lead to undercoverage in many settings. Our review of over 100 papers using MLM in political science, education, and sociology show that these “known” concerns have been widely ignored in practice. We describe how to debias MLM’s coefficient estimates, and provide an option to more flexibly estimate their standard errors. Most illuminating, once MLMs are adjusted in these two ways the point estimate and standard error for the target coefficient are exactly equal to those of the analogous FE model with cluster-robust standard errors. For investigators working with observational data and who are interested only in inference on the target coefficient, either approach is equally appropriate and preferable to uncorrected MLM.

Download Full-text

Efficiency of robust standard errors for regression coefficients

Communication in Statistics- Theory and Methods ◽

10.1080/03610929108830479 ◽

1991 ◽

Vol 20 (1) ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

D.V. Hinkley ◽

S. Wang

Keyword(s):

Regression Coefficients ◽

Standard Errors ◽

Robust Standard Errors

Download Full-text

PRACTITIONERS’ CORNER: Computing Robust Standard Errors for Within-groups Estimators*

Oxford Bulletin of Economics and Statistics ◽

10.1111/j.1468-0084.1987.mp49004006.x ◽

2009 ◽

Vol 49 (4) ◽

pp. 431-434 ◽

Cited By ~ 609

Author(s):

M. Arellano

Keyword(s):

Standard Errors ◽

Robust Standard Errors

Download Full-text

Sample-size calculation and reestimation for a semiparametric analysis of recurrent event data taking robust standard errors into account

Biometrical Journal ◽

10.1002/bimj.201300090 ◽

2014 ◽

Vol 56 (4) ◽

pp. 631-648 ◽

Cited By ~ 5

Author(s):

Katharina Ingel ◽

Antje Jahn-Eimermacher

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Recurrent Event ◽

Standard Errors ◽

Event Data ◽

Recurrent Event Data ◽

Semiparametric Analysis ◽

Robust Standard Errors

Download Full-text

Regression Discontinuity and Heteroskedasticity Robust Standard Errors: Evidence from a Fixed-Bandwidth Approximation

Journal of Econometric Methods ◽

10.1515/jem-2016-0007 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 1

Author(s):

Otávio Bartalotti

Keyword(s):

Bias Correction ◽

Asymptotic Theory ◽

Regression Discontinuity ◽

Traditional Approach ◽

Standard Errors ◽

Test Coverage ◽

Regression Discontinuity Designs ◽

The Common ◽

Theoretical Results ◽

Robust Standard Errors

AbstractIn regression discontinuity designs (RD), for a given bandwidth, researchers can estimate standard errors based on different variance formulas obtained under different asymptotic frameworks. In the traditional approach the bandwidth shrinks to zero as sample size increases; alternatively, the bandwidth could be treated as fixed. The main theoretical results for RD rely on the former, while most applications in the literature treat the estimates as parametric, implementing the usual heteroskedasticity-robust standard errors. This paper develops the “fixed-bandwidth” alternative asymptotic theory for RD designs, which sheds light on the connection between both approaches. I provide alternative formulas (approximations) for the bias and variance of common RD estimators, and conditions under which both approximations are equivalent. Simulations document the improvements in test coverage that fixed-bandwidth approximations achieve relative to traditional approximations, especially when there is local heteroskedasticity. Feasible estimators of fixed-bandwidth standard errors are easy to implement and are akin to treating RD estimators aslocallyparametric, validating the common empirical practice of using heteroskedasticity-robust standard errors in RD settings. Bias mitigation approaches are discussed and a novel bootstrap higher-order bias correction procedure based on the fixed bandwidth asymptotics is suggested.

Download Full-text

The association between WASH, nutrition, and early childhood growth faltering in rural Cambodia: a cross-sectional risk factor analysis

10.1101/2021.05.20.21257338 ◽

2021 ◽

Author(s):

Amanda Justine Lai ◽

Ramya Ambikapathi ◽

Oliver Cumming ◽

Krisna Seng ◽

Irene Velez ◽

...

Keyword(s):

Drinking Water ◽

Regression Models ◽

Water Source ◽

Community Level ◽

Standard Errors ◽

Drinking Water Source ◽

Household Level ◽

Growth Outcomes ◽

Growth Faltering ◽

Robust Standard Errors

Background Inadequate nutrition in early life and exposure to sanitation-related enteric pathogens have been linked to poor growth outcomes in children. Despite rapid development in Cambodia, high prevalence of growth faltering and stunting persist among children. This study aimed to assess nutrition and WASH variables and their association with nutritional status of children under 24 months in rural Cambodia. Methods We conducted surveys in 491 villages across 55 rural communes in Cambodia in September 2016 to measure associations between child, household, and community-level risk factors for stunting and length-for-age z-score (LAZ). A primary survey measured child-level variables, including anthropometric measures and risk factors for growth faltering and stunting, for 4,036 children under 24 months of age from 3,877 households (approximately 8 households per village). A secondary survey of 5,341 households, including the same households from the primary survey and an additional 1,464 households (approximately 3 additional household per village) from the same villages, assessed village-level WASH variables to understand community water, sanitation, and hygiene (WASH) conditions that may influence child growth outcomes. For LAZ, we calculated bivariate and adjusted associations (as mean differences) with 95% confidence intervals using generalized estimating equations (GEEs) to fit linear regression models with robust standard errors. For stunting, we calculated unadjusted and adjusted prevalence ratios (PRs) with 95% confidence intervals using GEEs to fit Poisson regression models with robust standard errors. For all models assessing effects of household-level variables, we used GEEs to account for clustering at the village level. Findings After adjustment for potential confounders, presence of water and soap at a household's handwashing station was found to be significantly associated (p<0.05) with increased LAZ (adjusted mean difference in LAZ +0.10, 95% CI 0.03 to 0.16), and household use of an improved drinking water source was associated with less stunting in children compared to households that did not use an improved source of drinking water (aPR 0.81, 95% CI 0.66 to 0.98); breastfeeding and community-level access to an improved drinking water source were associated with a lower LAZ score (-0.16, 95% CI -0.27 to -0.05; -0.13, 95% CI -0.26 to 0.00). No other nutrition (i.e., dietary diversity, meal frequency) or sanitation variables (i.e., household's safe disposal of child stools, household-level sanitation, community-level sanitation) were measured to be associated with LAZ scores or stunting in children under 24 months of age.

Download Full-text