scholarly journals Data‐adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss‐based estimation

Biometrics ◽  
2019 ◽  
Vol 76 (1) ◽  
pp. 145-157
Author(s):  
Mireille E. Schnitzer ◽  
Joel Sango ◽  
Steve Ferreira Guerra ◽  
Mark J. Laan
Author(s):  
Iván Díaz

Summary In recent decades, the fields of statistical and machine learning have seen a revolution in the development of data-adaptive regression methods that have optimal performance under flexible, sometimes minimal, assumptions on the true regression functions. These developments have impacted all areas of applied and theoretical statistics and have allowed data analysts to avoid the biases incurred under the pervasive practice of parametric model misspecification. In this commentary, I discuss issues around the use of data-adaptive regression in estimation of causal inference parameters. To ground ideas, I focus on two estimation approaches with roots in semi-parametric estimation theory: targeted minimum loss-based estimation (TMLE; van der Laan and Rubin, 2006) and double/debiased machine learning (DML; Chernozhukov and others, 2018). This commentary is not comprehensive, the literature on these topics is rich, and there are many subtleties and developments which I do not address. These two frameworks represent only a small fraction of an increasingly large number of methods for causal inference using machine learning. To my knowledge, they are the only methods grounded in statistical semi-parametric theory that also allow unrestricted use of data-adaptive regression techniques.


2021 ◽  
pp. 004912412199555
Author(s):  
Michael Baumgartner ◽  
Mathias Ambühl

Consistency and coverage are two core parameters of model fit used by configurational comparative methods (CCMs) of causal inference. Among causal models that perform equally well in other respects (e.g., robustness or compliance with background theories), those with higher consistency and coverage are typically considered preferable. Finding the optimally obtainable consistency and coverage scores for data [Formula: see text], so far, is a matter of repeatedly applying CCMs to [Formula: see text] while varying threshold settings. This article introduces a procedure called ConCovOpt that calculates, prior to actual CCM analyses, the consistency and coverage scores that can optimally be obtained by models inferred from [Formula: see text]. Moreover, we show how models reaching optimal scores can be methodically built in case of crisp-set and multi-value data. ConCovOpt is a tool, not for blindly maximizing model fit, but for rendering transparent the space of viable models at optimal fit scores in order to facilitate informed model selection—which, as we demonstrate by various data examples, may have substantive modeling implications.


2016 ◽  
Vol 12 (1) ◽  
pp. 97-115 ◽  
Author(s):  
Mireille E. Schnitzer ◽  
Judith J. Lok ◽  
Susan Gruber

Abstract This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010 [27]) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low- and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios.


2015 ◽  
Vol 3 (2) ◽  
pp. 207-236 ◽  
Author(s):  
Denis Talbot ◽  
Geneviève Lefebvre ◽  
Juli Atherton

AbstractEstimating causal exposure effects in observational studies ideally requires the analyst to have a vast knowledge of the domain of application. Investigators often bypass difficulties related to the identification and selection of confounders through the use of fully adjusted outcome regression models. However, since such models likely contain more covariates than required, the variance of the regression coefficient for exposure may be unnecessarily large. Instead of using a fully adjusted model, model selection can be attempted. Most classical statistical model selection approaches, such as Bayesian model averaging, do not readily address causal effect estimation. We present a new model averaged approach to causal inference, Bayesian causal effect estimation (BCEE), which is motivated by the graphical framework for causal inference. BCEE aims to unbiasedly estimate the causal effect of a continuous exposure on a continuous outcome while being more efficient than a fully adjusted approach.


2016 ◽  
Vol 4 (2) ◽  
Author(s):  
Peter M. Aronow

AbstractRecent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-$n$ consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect.


2017 ◽  
Vol 28 (4) ◽  
pp. 1044-1063 ◽  
Author(s):  
Cheng Ju ◽  
Richard Wyss ◽  
Jessica M Franklin ◽  
Sebastian Schneeweiss ◽  
Jenny Häggström ◽  
...  

Propensity score-based estimators are increasingly used for causal inference in observational studies. However, model selection for propensity score estimation in high-dimensional data has received little attention. In these settings, propensity score models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a propensity score model. This “collaborative learning” considers variable associations with both treatment and outcome when selecting a propensity score model in order to minimize a bias-variance tradeoff in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for propensity score estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the propensity score model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the collaborative minimum loss-based estimation algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the propensity score model selected by collaborative minimum loss-based estimation could be applied to other propensity score-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.


2010 ◽  
Vol 21 (1) ◽  
pp. 7-30 ◽  
Author(s):  
Stijn Vansteelandt ◽  
Maarten Bekaert ◽  
Gerda Claeskens

Sign in / Sign up

Export Citation Format

Share Document