Big Data for Finite Population Inference: Applying Quasi-Random Approaches to Naturalistic Driving Data Using Bayesian Additive Regression Trees

2020 ◽  
Vol 8 (1) ◽  
pp. 148-180
Author(s):  
Ali Rafei ◽  
Carol A C Flannagan ◽  
Michael R Elliott

Abstract Big Data are a “big challenge” for finite population inference. Lack of control over data-generating processes by researchers in the absence of a known random selection mechanism may lead to biased estimates. Further, larger sample sizes increase the relative contribution of selection bias to squared or absolute error. One approach to mitigate this issue is to treat Big Data as a random sample and estimate the pseudo-inclusion probabilities through a benchmark survey with a set of relevant auxiliary variables common to the Big Data. Since the true propensity model is usually unknown, and Big Data tend to be poor in such variables that fully govern the selection mechanism, the use of flexible non-parametric models seems to be essential. Traditionally, a weighted logistic model is recommended to account for the sampling weights in the benchmark survey when estimating the propensity scores. However, handling weights is a hurdle when seeking a broader range of predictive methods. To further protect against model misspecification, we propose using an alternative pseudo-weighting approach that allows us to fit more flexible modern predictive tools such as Bayesian Additive Regression Trees (BART), which automatically detect non-linear associations as well as high-order interactions. In addition, the posterior predictive distribution generated by BART makes it easier to quantify the uncertainty due to pseudo-weighting. Our simulation findings reveal further reduction in bias by our approach compared with conventional propensity adjustment method when the true model is unknown. Finally, we apply our method to the naturalistic driving data from the Safety Pilot Model Deployment using the National Household Travel Survey as a benchmark.

2010 ◽  
Vol 4 (1) ◽  
pp. 266-298 ◽  
Author(s):  
Hugh A. Chipman ◽  
Edward I. George ◽  
Robert E. McCulloch

2021 ◽  
pp. 395-414
Author(s):  
Carlos M. Carvalho ◽  
Edward I. George ◽  
P. Richard Hahn ◽  
Robert E. McCulloch

2020 ◽  
Vol 29 (11) ◽  
pp. 3218-3234 ◽  
Author(s):  
Liangyuan Hu ◽  
Chenyang Gu ◽  
Michael Lopez ◽  
Jiayi Ji ◽  
Juan Wisnivesky

There is a dearth of robust methods to estimate the causal effects of multiple treatments when the outcome is binary. This paper uses two unique sets of simulations to propose and evaluate the use of Bayesian additive regression trees in such settings. First, we compare Bayesian additive regression trees to several approaches that have been proposed for continuous outcomes, including inverse probability of treatment weighting, targeted maximum likelihood estimator, vector matching, and regression adjustment. Results suggest that under conditions of non-linearity and non-additivity of both the treatment assignment and outcome generating mechanisms, Bayesian additive regression trees, targeted maximum likelihood estimator, and inverse probability of treatment weighting using generalized boosted models provide better bias reduction and smaller root mean squared error. Bayesian additive regression trees and targeted maximum likelihood estimator provide more consistent 95% confidence interval coverage and better large-sample convergence property. Second, we supply Bayesian additive regression trees with a strategy to identify a common support region for retaining inferential units and for avoiding extrapolating over areas of the covariate space where common support does not exist. Bayesian additive regression trees retain more inferential units than the generalized propensity score-based strategy, and shows lower bias, compared to targeted maximum likelihood estimator or generalized boosted model, in a variety of scenarios differing by the degree of covariate overlap. A case study examining the effects of three surgical approaches for non-small cell lung cancer demonstrates the methods.


2019 ◽  
Vol 38 (25) ◽  
pp. 5048-5069 ◽  
Author(s):  
Yaoyuan Vincent Tan ◽  
Jason Roy

Sign in / Sign up

Export Citation Format

Share Document