scholarly journals Why Propensity Scores Should Not Be Used for Matching

2019 ◽  
Vol 27 (4) ◽  
pp. 435-454 ◽  
Author(s):  
Gary King ◽  
Richard Nielsen

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

2008 ◽  
Vol 24 (3) ◽  
pp. 165-173 ◽  
Author(s):  
Niko Kohls ◽  
Harald Walach

Validation studies of standard scales in the particular sample that one is studying are essential for accurate conclusions. We investigated the differences in answering patterns of the Brief-Symptom-Inventory (BSI), Transpersonal Trust Scale (TPV), Sense of Coherence Questionnaire (SOC), and a Social Support Scale (F-SoZu) for a matched sample of spiritually practicing (SP) and nonpracticing (NSP) individuals at two measurement points (t1, t2). Applying a sample matching procedure based on propensity scores, we selected two sociodemographically balanced subsamples of N = 120 out of a total sample of N = 431. Employing repeated measures ANOVAs, we found an intersample difference in means only for TPV and an intrasample difference for F-SoZu. Additionally, a group × time interaction effect was found for TPV. While Cronbach’s α was acceptable and comparable for both samples, a significantly lower test-rest-reliability for the BSI was found in the SP sample (rSP = .62; rNSP = .78). Thus, when researching the effects of spiritual practice, one should not only look at differences in means but also consider time stability. We recommend propensity score matching as an alternative for randomization in variables that defy experimental manipulation such as spirituality.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 408-408
Author(s):  
Si Young Song ◽  
Hey Jung Jun ◽  
Sun Ah Lee

Abstract The purpose of this study is to explore the effect of employment on depression and life satisfaction among old-aged. Using 12th (2017) wave and 13th (2018) wave of Korean Welfare Panel Study (KoWePS), three stages of analyses were conducted. First, through propensity score matching (PSM) method, sample with similar propensity scores was matched between the group that did not work in 12th wave but worked in 13th wave (experimental group, N=180), and the group that did not work in 12th and 13th wave (comparative group, N=180). Second, the matched sample was used to conduct multiple regression analysis with the group dummy variable (experimental group, comparative group) as an independent variable, and depression and life satisfaction as the dependent variables. Third, combined model of propensity score matching (PSM) and double difference (DD) method was conducted to more appropriately derive the net effect of employment. The results of multiple regression after propensity matching showed that employment had a positive effect on reducing depression (B= -1.70, p< .01) and increasing life satisfaction (B= .12, p< .01) in old-aged. Furthermore, in combined model of PSM and DD, life satisfaction was improved when employed compared to non-employed (B= .15, p< .05). The results of this study are meaningful in that the meaning of employment in old-aged is more clearly derived by solving selection bias and endogenous problems. Also, this study may provide reference for establishing welfare policies related to employment among old-aged.


2020 ◽  
Author(s):  
Youmi Suk ◽  
Hyunseung Kang

Recently, there has been growing interest in using machine learning (ML) methods for causal inference due to their automatic and flexible abilities to model the propensity score and the outcome model. However, almost all the ML methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on an ML method based on random forests known as Causal Forests and presents five simple modifications for tuning Causal Forests so that they are robust to cluster-level unmeasured confounding. Our simulation study finds that adjusting the algorithm with the propensity score from fixed effects logistic regression and using demeaned variables make the estimates more robust to cluster-level unmeasured confounding. In particular, using demeaned variables is useful when we are not sure of the functional form of the propensity scores. We conclude by demonstrating our proposals in a real data study concerning the effect of taking an eighth-grade algebra course on math achievement scores from the Early Childhood Longitudinal Study.


2019 ◽  
Vol 47 (11) ◽  
pp. 5601-5612
Author(s):  
Jian-Bo Zhou ◽  
Jing Yuan ◽  
Xing-Yao Tang ◽  
Wei Zhao ◽  
Fu-Qiang Luo ◽  
...  

Objective To our knowledge, the independent association between central obesity, defined by waist circumference (WC) or waist-to-hip ratio (WHR), and diabetic retinopathy (DR) remains unknown in Chinese individuals. Method The study was conducted in two stages. First, the relationship between WC or WHR and DR was estimated in a case-control set (DR vs. non-DR) for the whole population before and after propensity score matching. Subsequently, a systematic review and meta-analysis was performed on evidence from the literature to validate the relationship. Results Of 511 eligible patients, DR (N = 156) and non-DR (N = 156) patients with similar propensity scores were included in the propensity score matching analyses. Central obesity (defined by WC) was associated with risk of DR (odds ratio [OR] 1.07, 95% confidence interval [95% CI] (1.03–1.10). The meta-analysis showed that central obesity significantly increased the risk of DR by 12% (OR 1.12, 95% CI 1.02–1.22). Analysis of data from 18 studies showed a significant association between continuous body mass index and risk of proliferative DR (OR 0.95, 95% CI 0.93–0.98; I2 = 50%). Conclusion Central obesity, particularly as defined by WC, is associated with the risk of DR in the Chinese population.


2020 ◽  
Vol 29 (3) ◽  
pp. 644-658 ◽  
Author(s):  
Anais Andrillon ◽  
Romain Pirracchio ◽  
Sylvie Chevret

Propensity score (PS) matching is a very popular causal estimator usually used to estimate the average treatment effect on the treated (ATT) from observational data. However, opting for this estimator may raise some efficiency issues when the sample size is limited. Therefore, we aimed to evaluate the performance of propensity score matching in this context. We started with a motivating example based on a cohort of 66 children with sickle cell anemia who received either allogeneic bone-marrow transplant or chronic transfusion. We found substantial differences in the ATT estimate according to the model selected for propensity score estimation and subsequent matching. Then, we assessed the performance of the different propensity score matching methods and post-matching analyses to estimate the ATT using a simulation study. Although all selected propensity score matching methods were based of previous recommendations, we found important discrepancies in the estimation of treatment effect between them, underlining the importance of thorough sensitivity analyses when using propensity score matching in the context of small sample sizes.


2013 ◽  
Vol 3 (2) ◽  
pp. 1 ◽  
Author(s):  
William R. Shadish ◽  
Peter M. Steiner ◽  
Thomas D. Cook

Peikes, Moreno and Orzol (2008) sensibly caution researchers that propensity score analysis may not lead to valid causal inference in field applications. But at the same time, they made the far stronger claim to have performed an ideal test of whether propensity score matching in quasi-experimental data is capable of approximating the results of a randomized experiment in their dataset, and that this ideal test showed that such matching could not do so. In this article we show that their study does not support that conclusion because it failed to meet a number of basic criteria for an ideal test. By implication, many other purported tests of the effectiveness of propensity score analysis probably also fail to meet these criteria, and are therefore questionable contributions to the literature on the effects of propensity score analysis. DOI:10.2458/azu_jmmss_v3i2_shadish


Sign in / Sign up

Export Citation Format

Share Document