scholarly journals 1216Combining propensity score-weighting and multiple imputation is not a trivial task

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Andreas Halgreen Eiset ◽  
Morten Frydenberg

Abstract Background Propensity score (PS)-weighting and multiple imputation are two widely used statistical methods. Combining the two is not trivial and has received little attention in theory and practice. We present our considerations for their combination with application to a study of long-distance migration and post-traumatic stress disorder. We elaborate on the assumptions underlying the methods and discuss the methodological and practical implications of our choices and alternatives. Methods We made a number of choices a priori: to use logistic regression-based PS to produce “standardised mortality ratio”-weights and SMC-FCS to multiply impute missing data. We present a methodology to combine the methods by choosing the PS model based on covariate balance, using this model as the substantive model in the multiple imputation, producing and averaging the point estimates from each multiply imputed data set to give the estimate of association and computing the percentile confidence interval by bootstrapping. Results In our application, a simple PS model was chosen as the substantive model for imputing 10 data sets with 40 iterations and repeating the entirety 999 times to obtain a bootstrap confidence interval. Computing time was approximately 36 hours. Conclusions Our structured approach is demanding in both work-load and computational time. We do not consider the prior a draw-back: it makes some of the underlying assumptions explicit and the latter may be a nuisance that diminishes with time. Key messages Combining propensity score-weighting and multiple imputation is not a trivial task.

2021 ◽  
Author(s):  
Andreas Halgreen Eiset ◽  
Morten Frydenberg

We present our considerations for using multiple imputation to account for missing data in propensity score-weighted analysis with bootstrap percentile confidence interval. We outline the assumptions underlying each of the methods and discuss the methodological and practical implications of our choices and briefly point to alternatives. We made a number of choices a priori for example to use logistic regression-based propensity scores to produce standardized mortality ratio-weights and Substantive Model Compatible-Full Conditional Specification to multiply impute missing data (given no violation of underlying assumptions). We present a methodology to combine these methods by choosing the propensity score model based on covariate balance, using this model as the substantive model in the multiple imputation, producing and averaging the point estimates from each multiple imputed data set to give the estimate of association and computing the percentile confidence interval by bootstrapping. The described methodology is demanding in both work-load and in computational time, however, we do not consider the prior a draw-back: it makes some of the underlying assumptions explicit and the latter may be a nuisance that will diminish with faster computers and better implementations.


2015 ◽  
Vol 26 (4) ◽  
pp. 1824-1837 ◽  
Author(s):  
Yenny Webb-Vargas ◽  
Kara E Rudolph ◽  
David Lenis ◽  
Peter Murakami ◽  
Elizabeth A Stuart

Although covariate measurement error is likely the norm rather than the exception, methods for handling covariate measurement error in propensity score methods have not been widely investigated. We consider a multiple imputation-based approach that uses an external calibration sample with information on the true and mismeasured covariates, multiple imputation for external calibration, to correct for the measurement error, and investigate its performance using simulation studies. As expected, using the covariate measured with error leads to bias in the treatment effect estimate. In contrast, the multiple imputation for external calibration method can eliminate almost all the bias. We confirm that the outcome must be used in the imputation process to obtain good results, a finding related to the idea of congenial imputation and analysis in the broader multiple imputation literature. We illustrate the multiple imputation for external calibration approach using a motivating example estimating the effects of living in a disadvantaged neighborhood on mental health and substance use outcomes among adolescents. These results show that estimating the propensity score using covariates measured with error leads to biased estimates of treatment effects, but when a calibration data set is available, multiple imputation for external calibration can be used to help correct for such bias.


2008 ◽  
Vol 47 (05) ◽  
pp. 448-453 ◽  
Author(s):  
M. D. Chirlaque ◽  
C. Navarro ◽  
M. Márquez Cid

Summary Objectives: Record linkage between data sets is relatively simple when unique, universal, permanent, and common variables exist in each data set. This situation occurs infrequently; thus, there is a need to apply probabilistic methods to identify corresponding records. DataLink has been tested to determine if the use of clustering techniques will improve performance with a minimum decrease in accuracy. Methods: The study uses cancer registry data which includes hospital discharge and pathology reports from two hospitals in the Murcia Region for the years 2002-2003. These data are standardized prior to running DataLink. The original version of DataLink compares all of the records one by one, and in two later versions of the software clustering is applied which filters for one or more variables. Computing time and the proportion of detected matches have been investigated with each version. Results: The clustering versions achieve 96.1% and 96.2% accuracy, respectively. An improvement in the computational time of 97.3% and 98.6% is achieved for the two clustering versions compared with the original. The clustering versions lose 0.36% and 1.07% of real duplicates, respectively. Conclusions: DataLink implements deterministic and probabilistic record linkage to eliminate duplicates and to merge new information with existing cases. The standardization of variables to a common format has been adapted to the characteristics of Spanish language data. Clustering techniques minimize computational time and maximize accuracy in the detection of corresponding records.


Author(s):  
A Salman Avestimehr ◽  
Seyed Mohammadreza Mousavi Kalan ◽  
Mahdi Soltanolkotabi

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.


2021 ◽  
Vol 83 ◽  
pp. 56-62
Author(s):  
Beth Ann Griffin ◽  
Marika Suttorp Booth ◽  
Monica Busse ◽  
Edward J. Wild ◽  
Claude Setodji ◽  
...  

Author(s):  
Kazuhiko Kido ◽  
Christopher Bianco ◽  
Marco Caccamo ◽  
Wei Fang ◽  
George Sokos

Background: Only limited data are available that address the association between body mass index (BMI) and clinical outcomes in patients with heart failure with reduced ejection fraction who are receiving sacubitril/valsartan. Methods: We performed a retrospective multi-center cohort study in which we compared 3 body mass index groups (normal, overweight and obese groups) in patients with heart failure with reduced ejection fraction receiving sacubitril/valsartan. The follow-up period was at least 1 year. Propensity score weighting was performed. The primary outcomes were hospitalization for heart failure and all-cause mortality. Results: Of the 721 patients in the original cohort, propensity score weighting generated a cohort of 540 patients in 3 groups: normal weight (n = 78), overweight (n = 181), and obese (n = 281). All baseline characteristics were well-balanced between 3 groups after propensity score weighting. Among our results, we found no significant differences in hospitalization for heart failure (normal weight versus overweight: average hazard ratio [AHR] 1.29, 95% confidence interval [CI] = 0.76-2.20, P = 0.35; normal weight versus obese: AHR 1.04, 95% CI = 0.63-1.70, P = 0.88; overweight versus obese groups: AHR 0.81, 95% CI = 0.54-1.20, P = 0.29) or all-cause mortality (normal weight versus overweight: AHR 0.99, 95% CI = 0.59-1.67, P = 0.97; normal weight versus obese: AHR 0.87, 95% CI = 0.53-1.42, P = 0.57; overweight versus obese: AHR 0.87, 95% CI = 0.58-1.32, P = 0.52). Conclusion: We identified no significant associations between BMI and clinical outcomes in patients diagnosed with heart failure with a reduced ejection fraction who were treated with sacubitril/valsartan. A large-scale study should be performed to verify these results.


Aerospace ◽  
2021 ◽  
Vol 8 (5) ◽  
pp. 138
Author(s):  
Giuseppe Gallo ◽  
Adriano Isoldi ◽  
Dario Del Gatto ◽  
Raffaele Savino ◽  
Amedeo Capozzoli ◽  
...  

The present work is focused on a detailed description of an in-house, particle-in-cell code developed by the authors, whose main aim is to perform highly accurate plasma simulations on an off-the-shelf computing platform in a relatively short computational time, despite the large number of macro-particles employed in the computation. A smart strategy to set up the code is proposed, and in particular, the parallel calculation in GPU is explored as a possible solution for the reduction in computing time. An application on a Hall-effect thruster is shown to validate the PIC numerical model and to highlight the strengths of introducing highly accurate schemes for the electric field interpolation and the macroparticle trajectory integration in the time. A further application on a helicon double-layer thruster is presented, in which the particle-in-cell (PIC) code is used as a fast tool to analyze the performance of these specific electric motors.


Author(s):  
Ajay Jasra ◽  
Maria De Iorio ◽  
Marc Chadeau-Hyam

In this paper, we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated with such models. This typically consists of using importance sampling and sequential Monte Carlo techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor. However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper, we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.


Sign in / Sign up

Export Citation Format

Share Document