Targeting Key Survey Variables at the Unit Nonresponse Treatment Stage

Author(s):  
David Haziza ◽  
Sixia Chen ◽  
Yimeng Gao

Abstract In the presence of nonresponse, unadjusted estimators are vulnerable to nonresponse bias when the characteristics of the respondents differ from those of the nonrespondents. To reduce the bias, it is common practice to postulate a nonresponse model linking the response indicators and a set of fully observed variables. Estimated response probabilities are obtained by fitting the selected model, which are then used to adjust the base weights. The resulting estimator, referred to as the propensity score-adjusted estimator, is consistent provided the nonresponse model is correctly specified. In this article, we propose a weighting procedure that may improve the efficiency of propensity score estimators for survey variables identified as key variables by making a more extensive use of the auxiliary information available at the nonresponse treatment stage. Results from a simulation study suggest that the proposed procedure performs well in terms of efficiency when the data are missing at random and also achieves an efficient bias reduction when the data are not missing at random. We further apply our proposed methods to 2017–2018 National Health Nutrition and Examination Survey.

2018 ◽  
Vol 34 (1) ◽  
pp. 107-120 ◽  
Author(s):  
Phillip S. Kott ◽  
Dan Liao

Abstract When adjusting for unit nonresponse in a survey, it is common to assume that the response/nonresponse mechanism is a function of variables known either for the entire sample before unit response or at the aggregate level for the frame or population. Often, however, some of the variables governing the response/nonresponse mechanism can only be proxied by variables on the frame while they are measured (more) accurately on the survey itself. For example, an address-based sampling frame may contain area-level estimates for the median annual income and the fraction home ownership in a Census block group, while a household’s annual income category and ownership status are reported on the survey itself for the housing units responding to the survey. A relatively new calibration-weighting technique allows a statistician to calibrate the sample using proxy variables while assuming the response/ nonresponse mechanism is a function of the analogous survey variables. We will demonstrate how this can be done with data from the Residential Energy Consumption Survey National Pilot, a nationally representative web-and-mail survey of American households sponsored by the U.S. Energy Information Administration.


2017 ◽  
Vol 33 (3) ◽  
pp. 709-734 ◽  
Author(s):  
Carl-Erik Särndal ◽  
Peter Lundquist

Abstract One objective of adaptive data collection is to secure a better balanced survey response. Methods exist for this purpose, including balancing with respect to selected auxiliary variables. Such variables are also used at the estimation stage for (calibrated) nonresponse weighting adjustment. Earlier research has shown that the use of auxiliary information at the estimation stage can reduce bias, perhaps considerably, but without eliminating it. The question is: would it have contributed further to bias reduction if, prior to estimation, that information had also been used in data collection, to secure a more balanced set of respondents? If the answer is yes, there is clear incentive, from the point of view of better accuracy in the estimates, to practice adaptive survey design, otherwise perhaps not. A key question is how the regression relationship between the survey variable and the auxiliary vector presents itself in the sample as opposed to the response. Strength in the relationship is helpful but is not the only consideration. The dilemma with nonresponse is one of inconsistent regression: a regression model appropriate for the sample often fails for the responding subset, because nonresponse is selective, non-random. In this article, we examine how nonresponse bias in survey estimates depends on regression inconsistency, both seen as functions of response imbalance. As a measure of bias we use the deviation of the calibration adjusted estimator from the unbiased estimate under full response. We study how the deviation and the regression inconsistency depend on the imbalance. We observe in empirical work that both can be reduced, to a degree, by efforts to reduce imbalance by an adaptive data collection.


2017 ◽  
Vol 18 (2) ◽  
pp. 113-128 ◽  
Author(s):  
Juho Kopra ◽  
Juha Karvanen ◽  
Tommi Härkänen

In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially, whereas MAR imputation was not successful in bias reduction.


2008 ◽  
Vol 17 (6) ◽  
pp. 546-555 ◽  
Author(s):  
Soko Setoguchi ◽  
Sebastian Schneeweiss ◽  
M. Alan Brookhart ◽  
Robert J. Glynn ◽  
E. Francis Cook

2018 ◽  
Vol 28 (12) ◽  
pp. 3534-3549 ◽  
Author(s):  
Arman Alam Siddique ◽  
Mireille E Schnitzer ◽  
Asma Bahamyirou ◽  
Guanbo Wang ◽  
Timothy H Holtz ◽  
...  

This paper investigates different approaches for causal estimation under multiple concurrent medications. Our parameter of interest is the marginal mean counterfactual outcome under different combinations of medications. We explore parametric and non-parametric methods to estimate the generalized propensity score. We then apply three causal estimation approaches (inverse probability of treatment weighting, propensity score adjustment, and targeted maximum likelihood estimation) to estimate the causal parameter of interest. Focusing on the estimation of the expected outcome under the most prevalent regimens, we compare the results obtained using these methods in a simulation study with four potentially concurrent medications. We perform a second simulation study in which some combinations of medications may occur rarely or not occur at all in the dataset. Finally, we apply the methods explored to contrast the probability of patient treatment success for the most prevalent regimens of antimicrobial agents for patients with multidrug-resistant pulmonary tuberculosis.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


Sign in / Sign up

Export Citation Format

Share Document