scholarly journals Causal Subclassification Tree Algorithm and Robust Causal Effect Estimation via Subclassification

2020 ◽  
Vol 10 (1) ◽  
pp. 40
Author(s):  
Tomoshige Nakamura ◽  
Mihoko Minami

In observational studies, the existence of confounding variables should be attended to, and propensity score weighting methods are often used to eliminate their e ects. Although many causal estimators have been proposed based on propensity scores, these estimators generally assume that the propensity scores are properly estimated. However, researchers have found that even a slight misspecification of the propensity score model can result in a bias of estimated treatment effects. Model misspecification problems may occur in practice, and hence, using a robust estimator for causal effect is recommended. One such estimator is a subclassification estimator. Wang, Zhang, Richardson, & Zhou (2020) presented the conditions necessary for subclassification estimators to have $\sqrt{N}$-consistency and to be asymptotically well-defined and suggested an idea how to construct subclasses.

2021 ◽  
pp. 096228022098351
Author(s):  
Yan Li ◽  
Liang Li

The inverse probability weighting is an important propensity score weighting method to estimate the average treatment effect. Recent literature shows that it can be easily combined with covariate balancing constraints to reduce the detrimental effects of excessively large weights and improve balance. Other methods are available to derive weights that balance covariate distributions between the treatment groups without the involvement of propensity scores. We conducted comprehensive Monte Carlo experiments to study whether the use of covariate balancing constraints circumvent the need for correct propensity score model specification, and whether the use of a propensity score model further improves the estimation performance among methods that use similar covariate balancing constraints. We compared simple inverse probability weighting, two propensity score weighting methods with balancing constraints (covariate balancing propensity score, covariate balancing scoring rule), and two weighting methods with balancing constraints but without using the propensity scores (entropy balancing and kernel balancing). We observed that correct specification of the propensity score model remains important even when the constraints effectively balance the covariates. We also observed evidence suggesting that, with similar covariate balance constraints, the use of a propensity score model improves the estimation performance when the dimension of covariates is large. These findings suggest that it is important to develop flexible data-driven propensity score models that satisfy covariate balancing conditions.


2019 ◽  
Author(s):  
Donna Coffman ◽  
Jiangxiu Zhou ◽  
Xizhen Cai

Abstract Background Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.Results Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.Conclusions Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.


2017 ◽  
Vol 5 (2) ◽  
Author(s):  
Beth Ann Griffin ◽  
Daniel F. McCaffrey ◽  
Daniel Almirall ◽  
Lane F. Burgette ◽  
Claude Messan Setodji

Abstract:In this article, we carefully examine two important implementation issues when estimating propensity scores using generalized boosted models (GBM), a promising machine learning technique. First, we examine which of the following methods for tuning GBM lead to better covariate balance and inferences about causal effects: pursuing covariate balance between the treatment groups or tuning the propensity score model on the basis of a model fit criterion. Second, we examine how well GBM can handle irrelevant covariates that are included in the estimation model. We find that chasing balance rather than model fit when estimating propensity scores yielded better covariate balance and more accurate treatment effect estimates. Additionally, we find that adding irrelevant covariates to GBM increased imbalance and bias in the treatment effects. The findings from this paper have useful implications for other work focused on improving methods for estimating propensity scores.


Biometrika ◽  
2020 ◽  
Vol 107 (3) ◽  
pp. 533-554 ◽  
Author(s):  
Yang Ning ◽  
Peng Sida ◽  
Kosuke Imai

Summary We propose a robust method to estimate the average treatment effects in observational studies when the number of potential confounders is possibly much greater than the sample size. Our method consists of three steps. We first use a class of penalized $M$-estimators for the propensity score and outcome models. We then calibrate the initial estimate of the propensity score by balancing a carefully selected subset of covariates that are predictive of the outcome. Finally, the estimated propensity score is used to construct the inverse probability weighting estimator. We prove that the proposed estimator, which we call the high-dimensional covariate balancing propensity score, has the sample boundedness property, is root-$n$ consistent, asymptotically normal, and semiparametrically efficient when the propensity score model is correctly specified and the outcome model is linear in covariates. More importantly, we show that our estimator remains root-$n$ consistent and asymptotically normal so long as either the propensity score model or the outcome model is correctly specified. We provide valid confidence intervals in both cases and further extend these results to the case where the outcome model is a generalized linear model. In simulation studies, we find that the proposed methodology often estimates the average treatment effect more accurately than existing methods. We also present an empirical application, in which we estimate the average causal effect of college attendance on adulthood political participation. An open-source software package is available for implementing the proposed methodology.


2015 ◽  
Vol 26 (4) ◽  
pp. 1654-1670 ◽  
Author(s):  
Peter C Austin ◽  
Elizabeth A Stuart

There is increasing interest in estimating the causal effects of treatments using observational data. Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies. Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated. This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching. The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model. We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate. However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified. When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW. The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method. Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased. Furthermore, more extreme weights were observed for IPTW than for full matching. However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified.


2019 ◽  
Vol 29 (3) ◽  
pp. 659-676 ◽  
Author(s):  
Jing Dong ◽  
Junni L Zhang ◽  
Shuxi Zeng ◽  
Fan Li

This paper concerns estimation of subgroup treatment effects with observational data. Existing propensity score methods are mostly developed for estimating overall treatment effect. Although the true propensity scores balance covariates in any subpopulations, the estimated propensity scores may result in severe imbalance in subgroup samples. Indeed, subgroup analysis amplifies a bias-variance tradeoff, whereby increasing complexity of the propensity score model may help to achieve covariate balance within subgroups, but it also increases variance. We propose a new method, the subgroup balancing propensity score, to ensure good subgroup balance as well as to control the variance inflation. For each subgroup, the subgroup balancing propensity score chooses to use either the overall sample or the subgroup (sub)sample to estimate the propensity scores for the units within that subgroup, in order to optimize a criterion accounting for a set of covariate-balancing moment conditions for both the overall sample and the subgroup samples. We develop two versions of subgroup balancing propensity score corresponding to matching and weighting, respectively. We devise a stochastic search algorithm to estimate the subgroup balancing propensity score when the number of subgroups is large. We demonstrate through simulations that the subgroup balancing propensity score improves the performance of propensity score methods in estimating subgroup treatment effects. We apply the subgroup balancing propensity score method to the Italy Survey of Household Income and Wealth (SHIW) to estimate the causal effects of having debit card on household consumption for different income groups.


2017 ◽  
Vol 27 (10) ◽  
pp. 3126-3138 ◽  
Author(s):  
Jiaqi Li ◽  
Anil Vachani ◽  
Andrew Epstein ◽  
Nandita Mitra

Estimation of common cost–effectiveness measures, including the incremental cost–effectiveness ratio and the net monetary benefit, is complicated by the need to account for informative censoring and inherent skewness of the data. In addition, since the two components of these measures, medical costs and survival are often collected from observational claims data, one must account for potential confounders. We propose a novel doubly robust, unbiased estimator for cost–effectiveness based on propensity scores that allow the incorporation of cost history and time-varying covariates. Further, we use an ensemble machine learning approach to obtain improved predictions from parametric and non-parametric cost and propensity score models. Our simulation studies demonstrate that the proposed doubly robust approach performs well even under mis-specification of either the propensity score model or the outcome model. We apply our approach to a cost–effectiveness analysis of two competing lung cancer surveillance procedures, CT vs. chest X-ray, using SEER-Medicare data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Fatema Tuj Johara ◽  
Andrea Benedetti ◽  
Robert Platt ◽  
Dick Menzies ◽  
Piret Viiklepp ◽  
...  

Abstract Background Individual-patient data meta-analysis (IPD-MA) is an increasingly popular approach because of its analytical benefits. IPD-MA of observational studies must overcome the problem of confounding, otherwise biased estimates of treatment effect may be obtained. One approach to reducing confounding bias could be the use of propensity score matching (PSM). IPD-MA can be considered as two-stage clustered data (patients within studies) and propensity score matching can be implemented within studies, across studies, and combining both. Methods This article focuses on implementation of four PSM-based approaches for the analysis of data structure that exploit IPD-MA in two ways: (i) estimation of propensity score model using single-level or random-effects logistic regression; and (ii) matching of propensity scores (PS) across studies, within studies or preferential-within studies. We investigated the performance of these approaches through a simulation study, which considers an IPD-MA that examined the success of different treatments for multidrug-resistant tuberculosis (MDR-TB). The simulation parameters were varied according to three treatment prevalences (according to studies, 50% and 30%), three levels of heterogeneity between studies (low, moderate and high) and three levels of pooled odds ratio (1, 1.5, 3). Results All approaches showed greater biases at the higher levels of heterogeneity regardless of the choices of treatment prevalences. However, matching of propensity scores using within-study and preferential-within study reported better performance compared to matching across studies when treatment prevalence varied across-studies. For fixed prevalences, a random-effect propensity score model to estimate propensity scores followed by matching of propensity scores across-studies achieved lower biases compared to other PSM-based approaches. Conclusions Propensity score matching has wide application in health research while only limited literature is available on the implementation of PSM methods in IPD-MA, and until now methodological performance of PSM methods have not been examined. We believe, this work offers an intuition to the applied researcher for the choice of the PSM-based approaches.


2020 ◽  
Author(s):  
Donna Coffman ◽  
Jiangxiu Zhou ◽  
Xizhen Cai

Abstract Background: Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method: Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted. Results: Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness. Conclusions: Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.


Sign in / Sign up

Export Citation Format

Share Document