scholarly journals Small-Area Estimation with Zero-Inflated Data – a Simulation Study

2016 ◽  
Vol 32 (4) ◽  
pp. 963-986 ◽  
Author(s):  
Sabine Krieg ◽  
Harm Jan Boonstra ◽  
Marc Smeets

Abstract Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.

2018 ◽  
Vol 34 (2) ◽  
pp. 523-542 ◽  
Author(s):  
Thomas Zimmermann ◽  
Ralf Thomas Münnich

Abstract The demand for reliable business statistics at disaggregated levels, such as industry classes, increased considerably in recent years. Owing to small sample sizes for some of the domains, design-based methods may not provide estimates with adequate precision. Hence, modelbased small area estimation techniques that increase the effective sample size by borrowing strength are needed. Business data are frequently characterised by skewed distributions, with a few large enterprises that account for the majority of the total for the variable of interest, for example turnover. Moreover, the relationship between the variable of interest and the auxiliary variables is often non-linear on the original scale. In many cases, a lognormal mixed model provides a reasonable approximation of this relationship. In this article, we extend the empirical best prediction (EBP) approach to compensate for informative sampling, by incorporating design information among the covariates via an augmented modelling approach. This gives rise to the EBP under the augmented model. We propose to select the augmenting variable based on a joint assessment of a measure of predictive accuracy and a check of the normality assumptions. Finally, we compare our approach with alternatives in a model-based simulation study under different informative sampling mechanisms.


2020 ◽  
Vol 18 (1) ◽  
pp. 2-22
Author(s):  
Kusman Sadik ◽  
Rahma Anisa ◽  
Euis Aqmaliyah

The most commonly used method of small area estimation (SAE) is the empirical best linear unbiased prediction method based on a linear mixed model. However, it is not appropriate in the case of the zero-inflated target variable with a mixture of zeros and continuously distributed positive values. Therefore, various model-based SAE methods for zero-inflated data are developed, such as the Frequentist approach and the Bayesian approach. Both approaches are compared with the survey regression (SR) method which ignores the presence of zero-inflation in the data. The results show that the two SAE approaches for zero-inflated data are capable to yield more accurate area mean estimates than the SR method.


Author(s):  
María Dolores Esteban ◽  
María José Lombardía ◽  
Esther López-Vizcaíno ◽  
Domingo Morales ◽  
Agustín Pérez

Author(s):  
Benmei Liu ◽  
Isaac Dompreh ◽  
Anne M Hartman

Abstract Background The workplace and home are sources of exposure to secondhand smoke (SHS), a serious health hazard for nonsmoking adults and children. Smoke-free workplace policies and home rules protect nonsmoking individuals from SHS and help individuals who smoke to quit smoking. However, estimated population coverages of smoke-free workplace policies and home rules are not typically available at small geographic levels such as counties. Model-based small area estimation techniques are needed to produce such estimates. Methods Self-reported smoke-free workplace policies and home rules data came from the 2014-2015 Tobacco Use Supplement to the Current Population Survey. County-level design-based estimates of the two measures were computed and linked to county-level relevant covariates obtained from external sources. Hierarchical Bayesian models were then built and implemented through Markov Chain Monte Carlo methods. Results Model-based estimates of smoke-free workplace policies and home rules were produced for 3,134 (out of 3,143) U.S. counties. In 2014-2015, nearly 80% of U.S. adult workers were covered by smoke-free workplace policies, and more than 85% of U.S. adults were covered by smoke-free home rules. We found large variations within and between states in the coverage of smoke-free workplace policies and home rules. Conclusions The small-area modeling approach efficiently reduced the variability that was attributable to small sample size in the direct estimates for counties with data and predicted estimates for counties without data by borrowing strength from covariates and other counties with similar profiles. The county-level modeled estimates can serve as a useful resource for tobacco control research and intervention. Implications Detailed county- and state-level estimates of smoke-free workplace policies and home rules can help identify coverage disparities and differential impact of smoke-free legislation and related social norms. Moreover, this estimation framework can be useful for modeling different tobacco control variables and applied elsewhere, e.g., to other behavioral, policy, or health related topics.


Test ◽  
2018 ◽  
Vol 28 (2) ◽  
pp. 565-597 ◽  
Author(s):  
Monique Graf ◽  
J. Miguel Marín ◽  
Isabel Molina

2020 ◽  
Vol 13 (4) ◽  
pp. 901-924
Author(s):  
David Buil-Gil ◽  
Angelo Moretti ◽  
Natalie Shlomo ◽  
Juanjo Medina

Abstract There is growing need for reliable survey-based small area estimates of crime and confidence in police work to design and evaluate place-based policing strategies. Crime and confidence in policing are geographically aggregated and police resources can be targeted to areas with the most problems. High levels of spatial autocorrelation in these variables allow for using spatial random effects to improve small area estimation models and estimates’ reliability. This article introduces the Spatial Empirical Best Linear Unbiased Predictor (SEBLUP), which borrows strength from neighboring areas, to place-based policing. It assesses the SEBLUP under different scenarios of number of areas and levels of spatial autocorrelation and provides an application to confidence in policing in London. The SEBLUP should be applied for place-based policing strategies when the variable’s spatial autocorrelation is medium/high, and the number of areas is large. Confidence in policing is higher in Central and West London and lower in Eastern neighborhoods.


Sign in / Sign up

Export Citation Format

Share Document