Causal inference via string diagram surgery

Author(s):  
Bart Jacobs ◽  
Aleks Kissinger ◽  
Fabio Zanasi

Abstract Extracting causal relationships from observed correlations is a growing area in probabilistic reasoning, originating with the seminal work of Pearl and others from the early 1990s. This paper develops a new, categorically oriented view based on a clear distinction between syntax (string diagrams) and semantics (stochastic matrices), connected via interpretations as structure-preserving functors. A key notion in the identification of causal effects is that of an intervention, whereby a variable is forcefully set to a particular value independent of any prior propensities. We represent the effect of such an intervention as an endo-functor which performs ‘string diagram surgery’ within the syntactic category of string diagrams. This diagram surgery in turn yields a new, interventional distribution via the interpretation functor. While in general there is no way to compute interventional distributions purely from observed data, we show that this is possible in certain special cases using a calculational tool called comb disintegration. We demonstrate the use of this technique on two well-known toy examples: one where we predict the causal effect of smoking on cancer in the presence of a confounding common cause and where we show that this technique provides simple sufficient conditions for computing interventions which apply to a wide variety of situations considered in the causal inference literature; the other one is an illustration of counterfactual reasoning where the same interventional techniques are used, but now in a ‘twinned’ set-up, with two version of the world – one factual and one counterfactual – joined together via exogenous variables that capture the uncertainties at hand.

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Lola Étiévant ◽  
Vivian Viallon

Abstract Many causal models of interest in epidemiology involve longitudinal exposures, confounders and mediators. However, repeated measurements are not always available or used in practice, leading analysts to overlook the time-varying nature of exposures and work under over-simplified causal models. Our objective is to assess whether – and how – causal effects identified under such misspecified causal models relates to true causal effects of interest. We derive sufficient conditions ensuring that the quantities estimated in practice under over-simplified causal models can be expressed as weighted averages of longitudinal causal effects of interest. Unsurprisingly, these sufficient conditions are very restrictive, and our results state that the quantities estimated in practice should be interpreted with caution in general, as they usually do not relate to any longitudinal causal effect of interest. Our simulations further illustrate that the bias between the quantities estimated in practice and the weighted averages of longitudinal causal effects of interest can be substantial. Overall, our results confirm the need for repeated measurements to conduct proper analyses and/or the development of sensitivity analyses when they are not available.


2019 ◽  
Vol 188 (9) ◽  
pp. 1682-1685 ◽  
Author(s):  
Hailey R Banack

Abstract Authors aiming to estimate causal effects from observational data frequently discuss 3 fundamental identifiability assumptions for causal inference: exchangeability, consistency, and positivity. However, too often, studies fail to acknowledge the importance of measurement bias in causal inference. In the presence of measurement bias, the aforementioned identifiability conditions are not sufficient to estimate a causal effect. The most fundamental requirement for estimating a causal effect is knowing who is truly exposed and unexposed. In this issue of the Journal, Caniglia et al. (Am J Epidemiol. 2019;000(00):000–000) present a thorough discussion of methodological challenges when estimating causal effects in the context of research on distance to obstetrical care. Their article highlights empirical strategies for examining nonexchangeability due to unmeasured confounding and selection bias and potential violations of the consistency assumption. In addition to the important considerations outlined by Caniglia et al., authors interested in estimating causal effects from observational data should also consider implementing quantitative strategies to examine the impact of misclassification. The objective of this commentary is to emphasize that you can’t drive a car with only three wheels, and you also cannot estimate a causal effect in the presence of exposure misclassification bias.


2019 ◽  
Vol 24 (3) ◽  
pp. 109-112 ◽  
Author(s):  
Steven D Stovitz ◽  
Ian Shrier

Evidence-based medicine (EBM) calls on clinicians to incorporate the ‘best available evidence’ into clinical decision-making. For decisions regarding treatment, the best evidence is that which determines the causal effect of treatments on the clinical outcomes of interest. Unfortunately, research often provides evidence where associations are not due to cause-and-effect, but rather due to non-causal reasons. These non-causal associations may provide valid evidence for diagnosis or prognosis, but biased evidence for treatment effects. Causal inference aims to determine when we can infer that associations are or are not due to causal effects. Since recommending treatments that do not have beneficial causal effects will not improve health, causal inference can advance the practice of EBM. The purpose of this article is to familiarise clinicians with some of the concepts and terminology that are being used in the field of causal inference, including graphical diagrams known as ‘causal directed acyclic graphs’. In order to demonstrate some of the links between causal inference methods and clinical treatment decision-making, we use a clinical vignette of assessing treatments to lower cardiovascular risk. As the field of causal inference advances, clinicians familiar with the methods and terminology will be able to improve their adherence to the principles of EBM by distinguishing causal effects of treatment from results due to non-causal associations that may be a source of bias.


2021 ◽  
Author(s):  
Kentaro Fukumoto ◽  
Charles T. McClean ◽  
Kuninori Nakagawa

AbstractAs COVID-19 spread in 2020, most countries shut down schools in the hopes of slowing the pandemic. Yet, studies have not reached a consensus about the effectiveness of these policies partly because they lack rigorous causal inference. Our study aims to estimate the causal effects of school closures on the number of confirmed cases. To do so, we apply matching methods to municipal-level data in Japan. We do not find that school closures caused a reduction in the spread of the coronavirus. Our results suggest that policies on school closures should be reexamined given the potential negative consequences for children and parents.


Author(s):  
Negar Hassanpour

To identify the appropriate action to take, an intelligent agent must infer the causal effects of every possible action choices. A prominent example is precision medicine that attempts to identify which medical procedure will benefit each individual patient the most. This requires answering counterfactual questions such as: ""Would this patient have lived longer, had she received an alternative treatment?"". In my PhD, I attempt to explore ways to address the challenges associated with causal effect estimation; with a focus on devising methods that enhance performance according to the individual-based measures (as opposed to population-based measures).


2021 ◽  
Author(s):  
Jonathan Sulc ◽  
Jenny Sjaarda ◽  
Zoltan Kutalik

Causal inference is a critical step in improving our understanding of biological processes and Mendelian randomisation (MR) has emerged as one of the foremost methods to efficiently interrogate diverse hypotheses using large-scale, observational data from biobanks. Although many extensions have been developed to address the three core assumptions of MR-based causal inference (relevance, exclusion restriction, and exchangeability), most approaches implicitly assume that any putative causal effect is linear. Here we propose PolyMR, an MR-based method which provides a polynomial approximation of an (arbitrary) causal function between an exposure and an outcome. We show that this method provides accurate inference of the shape and magnitude of causal functions with greater accuracy than existing methods. We applied this method to data from the UK Biobank, testing for effects between anthropometric traits and continuous health-related phenotypes and found most of these (84%) to have causal effects which deviate significantly from linear. These deviations ranged from slight attenuation at the extremes of the exposure distribution, to large changes in the magnitude of the effect across the range of the exposure (e.g. a 1 kg/m2 change in BMI having stronger effects on glucose levels if the initial BMI was higher), to non-monotonic causal relationships (e.g. the effects of BMI on cholesterol forming an inverted U shape). Finally, we show that the linearity assumption of the causal effect may lead to the misinterpretation of health risks at the individual level or heterogeneous effect estimates when using cohorts with differing average exposure levels.


2021 ◽  
Author(s):  
Eric W. Bridgeford ◽  
Michael Powell ◽  
Gregory Kiar ◽  
Ross Lawrence ◽  
Brian Caffo ◽  
...  

AbstractBatch effects, undesirable sources of variance across multiple experiments, present a substantial hurdle for scientific and clinical discoveries. Specifically, the presence of batch effects can create both spurious discoveries and hide veridical signals, contributing to the ongoing reproducibility crisis. Typical approaches to dealing with batch effects conceptualize ‘batches’ as an associational effect, rather than a causal effect, despite the fact that the sources of variance that comprise the batch – potentially including experimental design and population demographics – causally impact downstream inferences. We therefore cast batch effects as a causal problem rather than an associational problem. This reformulation enables us to make explicit the assumptions and limitations of existing approaches for dealing with batch effects. We therefore develop causal batch effect strategies—CausalDcorr for discovery of batch effects and CausalComBat for mitigating batch effects – which build upon existing statistical associational methods by incorporating modern causal inference techniques. We apply these strategies to a large mega-study of human connectomes assembled by the Consortium for Reliability and Reproducibility, consisting of 24 batches including over 1700 individuals to illustrate that existing approaches create more spurious discoveries (false positives) and miss more veridical signals (true positives) than our proposed approaches. Our work therefore introduces a conceptual framing, as well as open source code, for combining multiple distinct datasets to increase confidence in claims of scientific and clinical discoveries.


2019 ◽  
Vol 109 (9) ◽  
pp. 3307-3338 ◽  
Author(s):  
Simon Freyaldenhoven ◽  
Christian Hansen ◽  
Jesse M. Shapiro

We consider a linear panel event-study design in which unobserved confounds may be related both to the outcome and to the policy variable of interest. We provide sufficient conditions to identify the causal effect of the policy by exploiting covariates related to the policy only through the confounds. Our model implies a set of moment equations that are linear in parameters. The effect of the policy can be estimated by 2SLS, and causal inference is valid even when endogeneity leads to pre-event trends (“pre-trends”) in the outcome. Alternative approaches perform poorly in our simulations. (JEL C23, C26)


2016 ◽  
Vol 4 (2) ◽  
Author(s):  
Peter M. Aronow

AbstractRecent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-$n$ consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect.


2022 ◽  
Author(s):  
Jonathan Sulc ◽  
Jennifer Sjaarda ◽  
Zoltan Kutalik

Abstract Causal inference is a critical step in improving our understanding of biological processes and Mendelian randomisation (MR) has emerged as one of the foremost methods to efficiently interrogate diverse hypotheses using large-scale, observational data from biobanks. Although many extensions have been developed to address the three core assumptions of MR-based causal inference (relevance, exclusion restriction, and exchangeability), most approaches implicitly assume that any putative causal effect is linear. Here we propose PolyMR, an MR-based method which provides a polynomial approximation of an (arbitrary) causal function between an exposure and an outcome. We show that this method provides accurate inference of the shape and magnitude of causal functions with greater accuracy than existing methods. We applied this method to data from the UK Biobank, testing for effects between anthropometric traits and continuous health-related phenotypes and found most of these (84%) to have causal effects which deviate significantly from linear. These deviations ranged from slight attenuation at the extremes of the exposure distribution, to large changes in the magnitude of the effect across the range of the exposure (e.g. a 1 kg/m2 change in BMI having stronger effects on glucose levels if the initial BMI was higher), to non-monotonic causal relationships (e.g. the effects of BMI on cholesterol forming an inverted U shape). Finally, we show that the linearity assumption of the causal effect may lead to the misinterpretation of health risks at the individual level or heterogeneous effect estimates when using cohorts with differing average exposure levels.


Sign in / Sign up

Export Citation Format

Share Document