Why Propensity Scores Should Not Be Used for Matching

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

Download Full-text

Validating Four Standard Scales in Spiritually Practicing and Nonpracticing Samples Using Propensity Score Matching

European Journal of Psychological Assessment ◽

10.1027/1015-5759.24.3.165 ◽

2008 ◽

Vol 24 (3) ◽

pp. 165-173 ◽

Cited By ~ 8

Author(s):

Niko Kohls ◽

Harald Walach

Keyword(s):

Propensity Score ◽

Propensity Score Matching ◽

Repeated Measures ◽

Propensity Scores ◽

Total Sample ◽

Experimental Manipulation ◽

Brief Symptom Inventory ◽

Time Interaction ◽

Social Support Scale ◽

Lower Test

Validation studies of standard scales in the particular sample that one is studying are essential for accurate conclusions. We investigated the differences in answering patterns of the Brief-Symptom-Inventory (BSI), Transpersonal Trust Scale (TPV), Sense of Coherence Questionnaire (SOC), and a Social Support Scale (F-SoZu) for a matched sample of spiritually practicing (SP) and nonpracticing (NSP) individuals at two measurement points (t1, t2). Applying a sample matching procedure based on propensity scores, we selected two sociodemographically balanced subsamples of N = 120 out of a total sample of N = 431. Employing repeated measures ANOVAs, we found an intersample difference in means only for TPV and an intrasample difference for F-SoZu. Additionally, a group × time interaction effect was found for TPV. While Cronbach’s α was acceptable and comparable for both samples, a significantly lower test-rest-reliability for the BSI was found in the SP sample (rSP = .62; rNSP = .78). Thus, when researching the effects of spiritual practice, one should not only look at differences in means but also consider time stability. We recommend propensity score matching as an alternative for randomization in variables that defy experimental manipulation such as spirituality.

Download Full-text

The Effects of Employment on Depression and Life Satisfaction Among Old-Aged Using the DD Method Combined With PSM

Innovation in Aging ◽

10.1093/geroni/igaa057.1315 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 408-408

Author(s):

Si Young Song ◽

Hey Jung Jun ◽

Sun Ah Lee

Keyword(s):

Life Satisfaction ◽

Propensity Score ◽

Propensity Score Matching ◽

Multiple Regression ◽

Propensity Scores ◽

Panel Study ◽

Double Difference ◽

Combined Model ◽

Comparative Group ◽

Experimental Group

Abstract The purpose of this study is to explore the effect of employment on depression and life satisfaction among old-aged. Using 12th (2017) wave and 13th (2018) wave of Korean Welfare Panel Study (KoWePS), three stages of analyses were conducted. First, through propensity score matching (PSM) method, sample with similar propensity scores was matched between the group that did not work in 12th wave but worked in 13th wave (experimental group, N=180), and the group that did not work in 12th and 13th wave (comparative group, N=180). Second, the matched sample was used to conduct multiple regression analysis with the group dummy variable (experimental group, comparative group) as an independent variable, and depression and life satisfaction as the dependent variables. Third, combined model of propensity score matching (PSM) and double difference (DD) method was conducted to more appropriately derive the net effect of employment. The results of multiple regression after propensity matching showed that employment had a positive effect on reducing depression (B= -1.70, p< .01) and increasing life satisfaction (B= .12, p< .01) in old-aged. Furthermore, in combined model of PSM and DD, life satisfaction was improved when employed compared to non-employed (B= .15, p< .05). The results of this study are meaningful in that the meaning of employment in old-aged is more clearly derived by solving selection bias and endogenous problems. Also, this study may provide reference for establishing welfare policies related to employment among old-aged.

Download Full-text

Do Top Social Apps Effect Voice Call? Evidence from Propensity Score Matching Methods

Lecture Notes in Computer Science - Smart Computing and Communication ◽

10.1007/978-3-030-34139-8_14 ◽

2019 ◽

pp. 136-149

Author(s):

Hao Jiang ◽

Min Lin ◽

Bingqing Liu ◽

Huifang Liu ◽

Yuanyuan Zeng ◽

...

Keyword(s):

Propensity Score ◽

Propensity Score Matching ◽

Voice Call ◽

Matching Methods

Download Full-text

Tuning Random Forests for Causal Inference Under Cluster-Level Unmeasured Confounding

10.31234/osf.io/36w72 ◽

2020 ◽

Author(s):

Youmi Suk ◽

Hyunseung Kang

Keyword(s):

Propensity Score ◽

Causal Inference ◽

Random Forests ◽

Fixed Effects ◽

Propensity Scores ◽

Real Data ◽

Unmeasured Confounding ◽

Variable Bias ◽

Almost All ◽

Cluster Level

Recently, there has been growing interest in using machine learning (ML) methods for causal inference due to their automatic and flexible abilities to model the propensity score and the outcome model. However, almost all the ML methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on an ML method based on random forests known as Causal Forests and presents five simple modifications for tuning Causal Forests so that they are robust to cluster-level unmeasured confounding. Our simulation study finds that adjusting the algorithm with the propensity score from fixed effects logistic regression and using demeaned variables make the estimates more robust to cluster-level unmeasured confounding. In particular, using demeaned variables is useful when we are not sure of the functional form of the propensity scores. We conclude by demonstrating our proposals in a real data study concerning the effect of taking an eighth-grade algebra course on math achievement scores from the Early Childhood Longitudinal Study.

Download Full-text

Correction to: Applied comparison of large-scale propensity score matching and cardinality matching for causal inference in observational research

BMC Medical Research Methodology ◽

10.1186/s12874-021-01365-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Stephen P. Fortin ◽

Stephen S. Johnston ◽

Martijn J. Schuemie

Keyword(s):

Propensity Score ◽

Causal Inference ◽

Propensity Score Matching ◽

Large Scale ◽

Observational Research

Download Full-text

Is central obesity associated with diabetic retinopathy in Chinese individuals? An exploratory study

Journal of International Medical Research ◽

10.1177/0300060519874909 ◽

2019 ◽

Vol 47 (11) ◽

pp. 5601-5612

Author(s):

Jian-Bo Zhou ◽

Jing Yuan ◽

Xing-Yao Tang ◽

Wei Zhao ◽

Fu-Qiang Luo ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Propensity Score ◽

Propensity Score Matching ◽

Central Obesity ◽

Propensity Scores ◽

Meta Analysis ◽

Before And After ◽

Independent Association ◽

Two Stages ◽

The Relationship

Objective To our knowledge, the independent association between central obesity, defined by waist circumference (WC) or waist-to-hip ratio (WHR), and diabetic retinopathy (DR) remains unknown in Chinese individuals. Method The study was conducted in two stages. First, the relationship between WC or WHR and DR was estimated in a case-control set (DR vs. non-DR) for the whole population before and after propensity score matching. Subsequently, a systematic review and meta-analysis was performed on evidence from the literature to validate the relationship. Results Of 511 eligible patients, DR (N = 156) and non-DR (N = 156) patients with similar propensity scores were included in the propensity score matching analyses. Central obesity (defined by WC) was associated with risk of DR (odds ratio [OR] 1.07, 95% confidence interval [95% CI] (1.03–1.10). The meta-analysis showed that central obesity significantly increased the risk of DR by 12% (OR 1.12, 95% CI 1.02–1.22). Analysis of data from 18 studies showed a significant association between continuous body mass index and risk of proliferative DR (OR 0.95, 95% CI 0.93–0.98; I2 = 50%). Conclusion Central obesity, particularly as defined by WC, is associated with the risk of DR in the Chinese population.

Download Full-text

Performance of propensity score matching to estimate causal effects in small samples

Statistical Methods in Medical Research ◽

10.1177/0962280219887196 ◽

2020 ◽

Vol 29 (3) ◽

pp. 644-658 ◽

Cited By ~ 2

Author(s):

Anais Andrillon ◽

Romain Pirracchio ◽

Sylvie Chevret

Keyword(s):

Propensity Score ◽

Propensity Score Matching ◽

Treatment Effect ◽

Marrow Transplant ◽

Small Sample ◽

Average Treatment Effect ◽

Sensitivity Analyses ◽

Small Samples ◽

Allogeneic Bone ◽

Matching Methods

Propensity score (PS) matching is a very popular causal estimator usually used to estimate the average treatment effect on the treated (ATT) from observational data. However, opting for this estimator may raise some efficiency issues when the sample size is limited. Therefore, we aimed to evaluate the performance of propensity score matching in this context. We started with a motivating example based on a cohort of 66 children with sickle cell anemia who received either allogeneic bone-marrow transplant or chronic transfusion. We found substantial differences in the ATT estimate according to the model selected for propensity score estimation and subsequent matching. Then, we assessed the performance of the different propensity score matching methods and post-matching analyses to estimate the ATT using a simulation study. Although all selected propensity score matching methods were based of previous recommendations, we found important discrepancies in the estimation of treatment effect between them, underlining the importance of thorough sensitivity analyses when using propensity score matching in the context of small sample sizes.

Download Full-text

Propensity Scores and Propensity Score Matching for Assessing Multiple Confounders

Statistical Analysis of Clinical Data on a Pocket Calculator, Part 2 - SpringerBriefs in Statistics ◽

10.1007/978-94-007-4704-3_5 ◽

2012 ◽

pp. 15-19

Author(s):

Ton J. Cleophas ◽

Aeilko H. Zwinderman

Keyword(s):

Propensity Score ◽

Propensity Score Matching ◽

Propensity Scores

Download Full-text

FRI0298 Application of propensity score-matching methods to compare data from long-term extension trials with data from an existing lupus registry

10.1136/annrheumdis-2017-eular.5071 ◽

2017 ◽

Author(s):

MB Urowitz ◽

R Wielage ◽

KA Kelton ◽

RL Ohsfeldt ◽

Y Asukai ◽

...

Keyword(s):

Propensity Score ◽

Propensity Score Matching ◽

Matching Methods

Download Full-text

A Case Study About Why It Can Be Difficult To Test Whether Propensity Score Analysis Works in Field Experiments

Journal of Methods and Measurement in the Social Sciences ◽

10.2458/v3i2.16475 ◽

2013 ◽

Vol 3 (2) ◽

pp. 1 ◽

Cited By ~ 3

Author(s):

William R. Shadish ◽

Peter M. Steiner ◽

Thomas D. Cook

Keyword(s):

Experimental Data ◽

Propensity Score ◽

Propensity Score Matching ◽

Field Experiments ◽

Propensity Score Analysis ◽

Randomized Experiment ◽

Score Analysis ◽

Quasi Experimental ◽

Do So

Peikes, Moreno and Orzol (2008) sensibly caution researchers that propensity score analysis may not lead to valid causal inference in field applications. But at the same time, they made the far stronger claim to have performed an ideal test of whether propensity score matching in quasi-experimental data is capable of approximating the results of a randomized experiment in their dataset, and that this ideal test showed that such matching could not do so. In this article we show that their study does not support that conclusion because it failed to meet a number of basic criteria for an ideal test. By implication, many other purported tests of the effectiveness of propensity score analysis probably also fail to meet these criteria, and are therefore questionable contributions to the literature on the effects of propensity score analysis. DOI:10.2458/azu_jmmss_v3i2_shadish

Download Full-text