scholarly journals From Controlled to Undisciplined Data: Estimating Causal Effects in the Era of Data Science Using a Potential Outcome Framework

2021 ◽  
Author(s):  
Francesca Dominici ◽  
Falco J. Bargagli-Stoffi ◽  
Fabrizia Mealli
2021 ◽  
Vol 15 (5) ◽  
pp. 1-46
Author(s):  
Liuyi Yao ◽  
Zhixuan Chu ◽  
Sheng Li ◽  
Yaliang Li ◽  
Jing Gao ◽  
...  

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.


2018 ◽  
Vol 14 (2) ◽  
pp. 37-56
Author(s):  
David Michael Vock ◽  
Laura Frances Boehm Vock

AbstractOffensive performance in baseball depends on a number of correlated factors: the pitches the batter faces, the batter’s choice to swing, and the batter’s hitting ability. Recently a renewed focus on the effect of plate discipline on batter performance has emerged. Plate discipline has traditionally been summarized as the proportion of pitches inside and outside of the strike zone a player swings at; however, there have been few metrics proposed to assess the effect of plate discipline directly on batters’ outcomes. In this paper, we focus on estimating a batter’s performance if he were able to adopt a different plate discipline. Because we wish to assess the effect of a counterfactual plate discipline, we use a potential outcome framework and show how the G-computation algorithm can be used to isolate the effect of plate discipline separately from a batter’s hitting ability or the types of pitches the batter faces. As an example, we implement our approach using data collected with the PITCHf/x system over the 2012–2014 seasons to identify the improvement Starlin Castro would expect to see in offensive performance were he able to adopt Andrew McCutchen’s plate discipline. We estimate that had Castro adopted McCutchen’s discipline his batting average, on-base percentage, and slugging percentage would have increased 0.017 (se = 0.004), 0.040 (se = 0.006), and 0.028 (se = 0.008), respectively.


2020 ◽  
Vol 58 (4) ◽  
pp. 1129-1179
Author(s):  
Guido W. Imbens

In this essay I discuss potential outcome and graphical approaches to causality, and their relevance for empirical work in economics. I review some of the work on directed acyclic graphs, including the recent The Book of Why (Pearl and Mackenzie 2018). I also discuss the potential outcome framework developed by Rubin and coauthors (e.g., Rubin 2006), building on work by Neyman (1990 [1923]). I then discuss the relative merits of these approaches for empirical work in economics, focusing on the questions each framework answers well, and why much of the the work in economics is closer in spirit to the potential outcome perspective. (JEL C31, C36, I26)


2007 ◽  
Vol 37 (1) ◽  
pp. 393-434 ◽  
Author(s):  
Jennie E. Brand ◽  
Yu Xie

We develop an approach to identifying and estimating causal effects in longitudinal settings with time-varying treatments and time-varying outcomes. The classic potential outcome approach to causal inference generally involves two time periods: units of analysis are exposed to one of two possible values of the causal variable, treatment or control, at a given point in time, and values for an outcome are assessed some time subsequent to exposure. In this paper, we develop a potential outcome approach for longitudinal situations in which both exposure to treatment and the effects of treatment are time-varying. In this longitudinal setting, the research interest centers not on only two potential outcomes, but on a whole matrix of potential outcomes, requiring a complicated conceptualization of many potential counterfactuals. Motivated by sociological applications, we develop a simplification scheme—a weighted composite causal effect that allows identification and estimation of effects with a number of possible solutions. Our approach is illustrated via an analysis of the effects of disability on subsequent employment status using panel data from the Wisconsin Longitudinal Study.


2020 ◽  
Vol 34 (06) ◽  
pp. 10210-10217
Author(s):  
Sanghack Lee ◽  
Juan Correa ◽  
Elias Bareinboim

The process of transporting and synthesizing experimental findings from heterogeneous data collections to construct causal explanations is arguably one of the most central and challenging problems in modern data science. This problem has been studied in the causal inference literature under the rubric of causal effect identifiability and transportability (Bareinboim and Pearl 2016). In this paper, we investigate a general version of this challenge where the goal is to learn conditional causal effects from an arbitrary combination of datasets collected under different conditions, observational or experimental, and from heterogeneous populations. Specifically, we introduce a unified graphical criterion that characterizes the conditions under which conditional causal effects can be uniquely determined from the disparate data collections. We further develop an efficient, sound, and complete algorithm that outputs an expression for the conditional effect whenever it exists, which synthesizes the available causal knowledge and empirical evidence; if the algorithm is unable to find a formula, then such synthesis is provably impossible, unless further parametric assumptions are made. Finally, we prove that do-calculus (Pearl 1995) is complete for this task, i.e., the inexistence of a do-calculus derivation implies the impossibility of constructing the targeted causal explanation.


2021 ◽  
Author(s):  
Young Keun Lee ◽  
Jisoo Kim ◽  
Sung Wook Seo

Abstract BackgroundThe recent explosion of cancer genomics provides extensive information about mutations and gene expression changes in cancer. However, most of the identified gene mutations are not clinically utilized. It remains uncertain whether the presence of a certain genetic alteration will affect treatment response. Conventional statistics have limitations for causal inferences and are hard to gain sufficient power in genomic datasets. Here, we developed and evaluated an algorithm for searching the causal genes that maximize the effect of the treatment.MethodsThe algorithm was developed based on the potential outcome framework and Bayesian posterior update. The precision of the algorithm was validated using a simulation dataset. The algorithm was implemented to a cBioPortal dataset. The genes discovered by the algorithm were externally validated within CancerSCAN screening data from Samsung Medical Center.ResultsSimulation data analysis showed that the C-search algorithm was able to identify nine causal genes out of ten. The C-search algorithm shows the discovery rate rapidly increasing until the 1500 number of data. Meanwhile, the log-rank test shows a slower increase in performance. The C-search algorithm was able to suggest nine causal genes from the cBioPortal Metabric dataset. Treating the patients with the causal genes are associated with better survival outcome in both the cBioPortal dataset and the CancerSCAN dataset which is used for external validation.ConclusionsOur C-search algorithm demonstrated better performance to identify causal effects of the genes than multiple rog-rank test analysis especially within a limited number of data. The result suggests that the C-search can discover the causal genes from various genetic datasets, where the number of samples is limited compared to the number of variables.


Author(s):  
Damien Bol

This chapter discusses experiments. For decades, social scientists were convinced that experimentations were not for them. Consequently, the use of comparative analysis was recommended as a substitute. Yet, since 1990, experiments have become increasingly popular in the social sciences. Experiments have two important advantages compared to observational methods. First, they allow the researcher to clearly identify what the causal variable X is and the outcome Y. Second, with observational methods the precision of the estimates depends on the extent to which the researcher manages to control for the differences between the cases. When the researcher cannot entirely capture these differences, the estimates are likely to be inflated, underestimated, or simply wrong. The chapter then considers the ‘Neyman-Rubin potential-outcome framework’ and looks at the two broad types of experiments: experiments in the field (including survey experiments), and in the lab. It also addresses ethical experiments.


2021 ◽  
pp. 183-192
Author(s):  
Katherine J. Hoggatt ◽  
Tyler J. VanderWeele ◽  
Sander Greenland

This chapter provides an introduction to causal inference theory for public health research. Causal inference can be viewed as a prediction problem, addressing the question of what the likely outcome will be under one action vs. an alternative action. To answer this question usefully requires clarity and precision in both the statement of the causal hypothesis and the techniques used to attempt an answer. This chapter reviews considerations that have been invoked in discussions of causality based on epidemiologic evidence. It then describes the potential-outcome (counterfactual) framework for cause and effect, which shows how measures of effect and association can be distinguished. The potential-outcome framework illustrates problems inherent in attempts to quantify the changes in health expected under different actions or interventions. The chapter concludes with a discussion of how research findings may be translated into policy.


Sign in / Sign up

Export Citation Format

Share Document