Bridging Finite and Super Population Causal Inference

AbstractThere are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite population, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the treatment assignment. These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. By recalling a variance decomposition and exploiting a completeness-type argument, we establish a connection between these two views in completely randomized experiments. This alternative formulation could serve as a template for bridging finite and super population causal inference in other scenarios.

Download Full-text

Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998618776435 ◽

2018 ◽

Vol 43 (5) ◽

pp. 540-567 ◽

Cited By ~ 3

Author(s):

Jiannan Lu ◽

Peng Ding ◽

Tirthankar Dasgupta

Keyword(s):

Causal Effect ◽

Real Life ◽

Causal Effects ◽

Potential Outcomes ◽

Randomized Experiments ◽

Sharp Bounds ◽

Average Causal Effect ◽

Behavioral Studies ◽

And Control ◽

Potential Outcomes Framework

Assessing the causal effects of interventions on ordinal outcomes is an important objective of many educational and behavioral studies. Under the potential outcomes framework, we can define causal effects as comparisons between the potential outcomes under treatment and control. However, unfortunately, the average causal effect, often the parameter of interest, is difficult to interpret for ordinal outcomes. To address this challenge, we propose to use two causal parameters, which are defined as the probabilities that the treatment is beneficial and strictly beneficial for the experimental units. However, although well-defined for any outcomes and of particular interest for ordinal outcomes, the two aforementioned parameters depend on the association between the potential outcomes and are therefore not identifiable from the observed data without additional assumptions. Echoing recent advances in the econometrics and biostatistics literature, we present the sharp bounds of the aforementioned causal parameters for ordinal outcomes, under fixed marginal distributions of the potential outcomes. Because the causal estimands and their corresponding sharp bounds are based on the potential outcomes themselves, the proposed framework can be flexibly incorporated into any chosen models of the potential outcomes and is directly applicable to randomized experiments, unconfounded observational studies, and randomized experiments with noncompliance. We illustrate our methodology via numerical examples and three real-life applications related to educational and behavioral research.

Download Full-text

Causal inference under multiple versions of treatment

Journal of Causal Inference ◽

10.1515/jci-2012-0002 ◽

2013 ◽

Vol 1 (1) ◽

pp. 1-20 ◽

Cited By ~ 58

Author(s):

Tyler J. VanderWeele ◽

Miguel A. Hernan

Keyword(s):

Causal Inference ◽

Treatment Effect ◽

Indirect Effects ◽

Causal Effect ◽

Potential Outcomes ◽

Direct And Indirect Effects ◽

Treatment Variable ◽

Effect Of Treatment ◽

New Treatment ◽

Potential Outcomes Framework

Abstract: In this article, we discuss causal inference when there are multiple versions of treatment. The potential outcomes framework, as articulated by Rubin, makes an assumption of no multiple versions of treatment, and here we discuss an extension of this potential outcomes framework to accommodate causal inference under violations of this assumption. A variety of examples are discussed in which the assumption may be violated. Identification results are provided for the overall treatment effect and the effect of treatment on the treated when multiple versions of treatment are present and also for the causal effect comparing a version of one treatment to some other version of the same or a different treatment. Further identification and interpretative results are given for cases in which the version precedes the treatment as when an underlying treatment variable is coarsened or dichotomized to create a new treatment variable for which there are effectively “multiple versions”. Results are also given for effects defined by setting the version of treatment to a prespecified distribution. Some of the identification results bear resemblance to identification results in the literature on direct and indirect effects. We describe some settings in which ignoring multiple versions of treatment, even when present, will not lead to incorrect inferences.

Download Full-text

Data-Adaptive Causal Effects and Superefficiency

Journal of Causal Inference ◽

10.1515/jci-2016-0007 ◽

2016 ◽

Vol 4 (2) ◽

Cited By ~ 1

Author(s):

Peter M. Aronow

Keyword(s):

Causal Inference ◽

Empirical Distribution ◽

Causal Effect ◽

Short Note ◽

Causal Effects ◽

Asymptotically Normal ◽

Average Causal Effect ◽

Population Average ◽

Data Adaptive ◽

Local Average

AbstractRecent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-$n$ consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect.

Download Full-text

Emulating Target Trials to Improve Causal Inference from Agent-Based Models

American Journal of Epidemiology ◽

10.1093/aje/kwab040 ◽

2021 ◽

Author(s):

Eleanor J Murray ◽

Brandon D L Marshall ◽

Ashley L Buchanan

Keyword(s):

Causal Inference ◽

Disease Transmission ◽

Causal Effect ◽

Potential Outcomes ◽

Emergent Properties ◽

Target Trial ◽

Agent Based Models ◽

Agent Based ◽

Effect Estimation ◽

Potential Outcomes Framework

Abstract Agent-based models are a key tool for investigating the emergent properties of population health settings, such as infectious disease transmission, where the exposure often violates the key ‘no interference’ assumption of traditional causal inference under the potential outcomes framework. Agent-based models and other simulation-based modeling approaches have generally been viewed as a separate knowledge-generating paradigm from the potential outcomes framework, but this can lead to confusion about how to interpret the results of these models in real-world settings. By explicitly incorporating the target trial framework into the development of an agent-based or other simulation model, we can clarify the causal parameters of interest, as well as make explicit the assumptions required for valid causal effect estimation within or between populations. In this paper, we describe the use of the target trial framework for designing agent-based models when the goal is estimation of causal effects in the presence of interference, or spillover.

Download Full-text

Causal Inference

Sociology ◽

10.1093/obo/9780199756384-0240 ◽

2020 ◽

Author(s):

Pablo Geraldo Bastías ◽

Jennie E. Brand

Keyword(s):

Causal Inference ◽

Statistical Methods ◽

Quantitative Methods ◽

Fundamental Problem ◽

Causal Effect ◽

College Education ◽

Directed Acyclic Graphs ◽

Structural Equations ◽

Potential Outcomes ◽

Formal Representation

Causal inference is a growing interdisciplinary subfield in statistics, computer science, economics, epidemiology, and the social sciences. In contrast with both traditional quantitative methods and cutting-edge approaches like machine learning, causal inference questions are defined in relation to potential outcomes, or variable values that are counterfactual to the observed world and therefore cannot be answered from joint probabilities alone, even with infinite data. The fact that one can possibly observe at most one potential outcome among those of interest is known as the “fundamental problem of causal inference.” For example, in this framework, the economic return to college education can be defined as a comparison between two potential outcomes: the wages of an individual with a college education versus the wages that the same individual would have received had he or she not attended college. In general, researchers are interested in estimating such effects for certain groups and comparing the effects for different subpopulations. Critical to causal inference is recognizing that, to answer causal questions from observed data, one has to rely on untestable assumptions about how the data were generated. In other words, there is no particular statistical method that would render a conclusion “causal”; the validity of such an interpretation depends on a combination of data, assumptions about the data-generating process based on expert judgment, and estimation techniques. In the last several decades, our understanding of causality has improved enormously, owing to a conceptual apparatus and a mathematical language that enables rigorous conceptualization of causal quantities and formal representation of causal assumptions, while still employing familiar statistical methods. Potential outcomes or the Neyman-Rubin causal model and structural equations encoded as directed acyclic graphs (DAGs, also known as structural causal models) are two common approaches for conceptualizing causal relationships. The symbiosis of both languages offers a powerful framework to address causal questions. This review covers developments in both causal identification (i.e., deciding if a quantity of interest would be recoverable from infinite data, based on our assumptions) and causal effect estimation (i.e., the use of statistical methods to approximate that answer with finite, although potentially big, data). The literature is presented following the type of assumptions and questions frequently encountered in empirical research, ending with a discussion of promising new directions in the field.

Download Full-text

Insights on Variance Estimation for Blocked and Matched Pairs Designs

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620946272 ◽

2020 ◽

pp. 107699862094627

Author(s):

Nicole E. Pashley ◽

Luke W. Miratrix

Keyword(s):

Variance Estimation ◽

Control Unit ◽

Potential Outcomes ◽

Matched Pairs ◽

Randomized Experiments ◽

Multiple Treatment ◽

Variance Estimators ◽

Control Units ◽

Average Impact ◽

And Control

Evaluating blocked randomized experiments from a potential outcomes perspective has two primary branches of work. The first focuses on larger blocks, with multiple treatment and control units in each block. The second focuses on matched pairs, with a single treatment and control unit in each block. These literatures not only provide different estimators for the standard errors of the estimated average impact, but they are also built on different sets of assumptions. Neither literature handles cases with blocks of varying size that contain singleton treatment or control units, a case which can occur in a variety of contexts, such as with different forms of matching or poststratification. In this article, we reconcile the literatures by carefully examining the performance of variance estimators under several different frameworks. We then use these insights to derive novel variance estimators for experiments containing blocks of different sizes.

Download Full-text

A Note on Posttreatment Selection in Studying Racial Discrimination in Policing

American Political Science Review ◽

10.1017/s0003055421000654 ◽

2021 ◽

pp. 1-14

Author(s):

QINGYUAN ZHAO ◽

LUKE J KEELE ◽

DYLAN S SMALL ◽

MARSHALL M JOFFE

Keyword(s):

Causal Inference ◽

Racial Discrimination ◽

Risk Ratio ◽

Causal Effect ◽

Similar Data ◽

Police Violence ◽

Administrative Records ◽

Average Causal Effect ◽

Naive Estimator ◽

Administrative Datasets

We discuss some causal estimands that are used to study racial discrimination in policing. A central challenge is that not all police–civilian encounters are recorded in administrative datasets and available to researchers. One possible solution is to consider the average causal effect of race conditional on the civilian already being detained by the police. We find that such an estimand can be quite different from the more familiar ones in causal inference and needs to be interpreted with caution. We propose using an estimand that is new for this context—the causal risk ratio, which has more transparent interpretation and requires weaker identification assumptions. We demonstrate this through a reanalysis of the NYPD Stop-and-Frisk dataset. Our reanalysis shows that the naive estimator that ignores the posttreatment selection in administrative records may severely underestimate the disparity in police violence between minorities and whites in these and similar data.

Download Full-text

Causal Inference in Studying the Long-term Health Effects of Disasters: Challenges and Potential Solutions

American Journal of Epidemiology ◽

10.1093/aje/kwab064 ◽

2021 ◽

Author(s):

Koichiro Shiba ◽

Takuya Kawahara ◽

Jun Aida ◽

Katsunori Kondo ◽

Naoki Kondo ◽

...

Keyword(s):

Causal Inference ◽

Health Effects ◽

Selection Bias ◽

Proportional Hazards ◽

Cox Regression ◽

Causal Effect ◽

Time Varying ◽

Average Causal Effect ◽

Effect Estimation

Abstract Two frequently encountered but underrecognized challenges for causal inference in studying the long-term health effects of disasters among survivors include: (a) time-varying effects of disasters on a time-to-event outcome and (b) selection bias due to selective attrition. We review approaches to overcome these challenges and show application of the approaches to a real-world longitudinal data of older adults who were directly impacted by the 2011 earthquake and tsunami (n=4,857). To illustrate the problem of time-varying effects of disasters, we examined the association between degree of damage due to the tsunami and all-cause mortality. We compared results from Cox regression assuming proportional hazards versus adjusted parametric survival curves allowing for time-varying hazard ratios. To illustrate the problem of selection bias, we examined the association between proximity to the coast (a proxy for housing damage from the tsunami) and depressive symptoms. We corrected for selection bias due to attrition in the two post-disaster follow-up surveys (conducted in 2013 and 2016) using multivariable adjustment, inverse probability censoring weighting, and survivor average causal effect estimation. Our results demonstrate that the analytic approaches ignoring time-varying effects on mortality and selection bias due to selective attrition may underestimate the long-term health effects of disasters.

Download Full-text

Learning Heterogeneity in Causal Inference Using Sufficient Dimension Reduction

Journal of Causal Inference ◽

10.1515/jci-2018-0015 ◽

2019 ◽

Vol 7 (1) ◽

Author(s):

Wei Luo ◽

Wenbo Wu ◽

Yeying Zhu

Keyword(s):

Causal Inference ◽

Dimension Reduction ◽

Causal Effect ◽

Real Data ◽

Potential Outcomes ◽

Estimation Accuracy ◽

Sufficient Dimension Reduction ◽

Simulation Studies ◽

The Mean ◽

The Individual

AbstractOften the research interest in causal inference is on the regression causal effect, which is the mean difference in the potential outcomes conditional on the covariates. In this paper, we use sufficient dimension reduction to estimate a lower dimensional linear combination of the covariates that is sufficient to model the regression causal effect. Compared with the existing applications of sufficient dimension reduction in causal inference, our approaches are more efficient in reducing the dimensionality of covariates, and avoid estimating the individual outcome regressions. The proposed approaches can be used in three ways to assist modeling the regression causal effect: to conduct variable selection, to improve the estimation accuracy, and to detect the heterogeneity. Their usefulness are illustrated by both simulation studies and a real data example.

Download Full-text

Estimating the Magnitude of the Relation Between Bullying, E-Bullying, and Suicidal Behaviors Among United States Youth, 2015

Crisis ◽

10.1027/0227-5910/a000544 ◽

2019 ◽

Vol 40 (3) ◽

pp. 157-165 ◽

Cited By ~ 3

Author(s):

Kevin S. Kuehn ◽

Annelise Wagner ◽

Jennifer Velloza

Keyword(s):

Adolescent Suicide ◽

Suicide Attempts ◽

Causal Effect ◽

Strong Association ◽

Cross Sectional ◽

Average Causal Effect ◽

Effects Of Bullying ◽

Pc Algorithm ◽

Nationally Representative ◽

Significant Covariates

Abstract. Background: Suicide is the second leading cause of death among US adolescents aged 12–19 years. Researchers would benefit from a better understanding of the direct effects of bullying and e-bullying on adolescent suicide to inform intervention work. Aims: To explore the direct and indirect effects of bullying and e-bullying on adolescent suicide attempts (SAs) and to estimate the magnitude of these effects controlling for significant covariates. Method: This study uses data from the 2015 Youth Risk Behavior Surveillance Survey (YRBS), a nationally representative sample of US high school youth. We quantified the association between bullying and the likelihood of SA, after adjusting for covariates (i.e., sexual orientation, obesity, sleep, etc.) identified with the PC algorithm. Results: Bullying and e-bullying were significantly associated with SA in logistic regression analyses. Bullying had an estimated average causal effect (ACE) of 2.46%, while e-bullying had an ACE of 4.16%. Limitations: Data are cross-sectional and temporal precedence is not known. Conclusion: These findings highlight the strong association between bullying, e-bullying, and SA.

Download Full-text