Robust Small Area Estimation: An Overview

2020 ◽  
Vol 7 (1) ◽  
pp. 337-360
Author(s):  
Jiming Jiang ◽  
J. Sunil Rao

A small area typically refers to a subpopulation or domain of interest for which a reliable direct estimate, based only on the domain-specific sample, cannot be produced due to small sample size in the domain. While traditional small area methods and models are widely used nowadays, there have also been much work and interest in robust statistical inference for small area estimation (SAE). We survey this work and provide a comprehensive review here. We begin with a brief review of the traditional SAE methods. We then discuss SAE methods that are developed under weaker assumptions and SAE methods that are robust in certain ways, such as in terms of outliers or model failure. Our discussion also includes topics such as nonparametric SAE methods, Bayesian approaches, model selection and diagnostics, and missing data. A brief review of software packages available for implementing robust SAE methods is also given.

Author(s):  
Benmei Liu ◽  
Isaac Dompreh ◽  
Anne M Hartman

Abstract Background The workplace and home are sources of exposure to secondhand smoke (SHS), a serious health hazard for nonsmoking adults and children. Smoke-free workplace policies and home rules protect nonsmoking individuals from SHS and help individuals who smoke to quit smoking. However, estimated population coverages of smoke-free workplace policies and home rules are not typically available at small geographic levels such as counties. Model-based small area estimation techniques are needed to produce such estimates. Methods Self-reported smoke-free workplace policies and home rules data came from the 2014-2015 Tobacco Use Supplement to the Current Population Survey. County-level design-based estimates of the two measures were computed and linked to county-level relevant covariates obtained from external sources. Hierarchical Bayesian models were then built and implemented through Markov Chain Monte Carlo methods. Results Model-based estimates of smoke-free workplace policies and home rules were produced for 3,134 (out of 3,143) U.S. counties. In 2014-2015, nearly 80% of U.S. adult workers were covered by smoke-free workplace policies, and more than 85% of U.S. adults were covered by smoke-free home rules. We found large variations within and between states in the coverage of smoke-free workplace policies and home rules. Conclusions The small-area modeling approach efficiently reduced the variability that was attributable to small sample size in the direct estimates for counties with data and predicted estimates for counties without data by borrowing strength from covariates and other counties with similar profiles. The county-level modeled estimates can serve as a useful resource for tobacco control research and intervention. Implications Detailed county- and state-level estimates of smoke-free workplace policies and home rules can help identify coverage disparities and differential impact of smoke-free legislation and related social norms. Moreover, this estimation framework can be useful for modeling different tobacco control variables and applied elsewhere, e.g., to other behavioral, policy, or health related topics.


2017 ◽  
Vol 18 (1) ◽  
pp. 1
Author(s):  
Frida Murtinasari ◽  
Alfian Futuhul Hadi ◽  
Dian Anggraeni

SAE (Small Area Estimation) is often used by researchers, especially statisticians to estimate parameters of a subpopulation which has a small sample size. Empirical Best Linear Unbiased Prediction (EBLUP) is one of the indirect estimation methods in Small Area Estimation. The presence of outliers in the data can not guarantee that these methods yield precise predictions . Robust regression is one approach that is used in the model Small Area Estimation. Robust approach in estimating such a small area known as the Robust Small Area Estimation. Robust Small Area Estimation divided into several approaches. It calls Maximum Likelihood and M- Estimation. From the result, Robust Small Area Estimation with M-Estimation has the smallest RMSE than others. The value is 1473.7 (with outliers) and 1279.6 (without outlier). In addition the research also indicated that REBLUP with M-Estimation more robust to outliers. It causes the RMSE value with EBLUP has five times to be large with only one outlier are included in the data analysis. As for the REBLUP method is relatively more stable RMSE results.


2021 ◽  
Vol 5 (1) ◽  
pp. 50-60
Author(s):  
Naima Rakhsyanda ◽  
Kusman Sadik ◽  
Indahwati Indahwati

Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.


Author(s):  
Jan Pablo Burgard ◽  
Domingo Morales ◽  
Anna-Lena Wölwer

AbstractSocioeconomic indicators play a crucial role in monitoring political actions over time and across regions. Income-based indicators such as the median income of sub-populations can provide information on the impact of measures, e.g., on poverty reduction. Regional information is usually published on an aggregated level. Due to small sample sizes, these regional aggregates are often associated with large standard errors or are missing if the region is unsampled or the estimate is simply not published. For example, if the median income of Hispanic or Latino Americans from the American Community Survey is of interest, some county-year combinations are not available. Therefore, a comparison of different counties or time-points is partly not possible. We propose a new predictor based on small area estimation techniques for aggregated data and bivariate modeling. This predictor provides empirical best predictions for the partially unavailable county-year combinations. We provide an analytical approximation to the mean squared error. The theoretical findings are backed up by a large-scale simulation study. Finally, we return to the problem of estimating the county-year estimates for the median income of Hispanic or Latino Americans and externally validate the estimates.


2017 ◽  
Vol 43 (2) ◽  
pp. 182-224
Author(s):  
Wendy Chan

Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score–based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods such as subclassification by propensity score, where limited sample sizes lead to sparse strata. This article explores the potential of small area estimation methods to improve the precision of estimators in sparse strata using population data as a source of auxiliary information to borrow strength. Results from simulation studies identify the conditions under which small area estimators outperform conventional estimators and the limitations of this application to causal generalization studies.


2021 ◽  
pp. 1-21
Author(s):  
Mizanur Rahman ◽  
Deluar J. Moloy ◽  
Sifat Ar Salan

Nowadays, estimation demand in statistics is increased worldwide to seek out an estimate, or approximation, which may be a value which will be used for various purpose, albeit the input data could also be incomplete, uncertain, or unstable. The development of different estimation methods is trying to provide most accurate estimate and estimation theory deals with finding estimates with good properties. The demand of small area estimation (SAE) method has been increasing rapidly around the world because of its reliability compared to the traditional direct estimation methods, especially in the case of small sample size. This paper mainly focuses on the comparison of several indirect small area estimation methods (poststratified synthetic, SSD and EB estimates) with traditional direct estimator based on a renowned data set. Direct estimator is approximately unbiased but SSD and Post-stratified synthetic estimator is extreme biased. To cope up the problem, we conduct another model-based estimation procedure namely Empirical Bayes (EB) estimator, which is unbiased and compare them using their coefficient of variation (CV). To check the model assumption, we used Q-Q plot as well as a Histogram to confirm the normality, bivariate correlation, Akaike information criterion (AIC). JEL classification numbers: C13, C51, C51. Keywords: Small Area Estimation, Direct Estimation, Indirect Estimation, Empirical Bayes Estimator, Poverty Mapping.


2021 ◽  
Author(s):  
Hukum Chandra ◽  
Saurav Guha ◽  
Meghana Desai ◽  
Saumyadipta Pyne

Achieving food security for all citizens is an important policy issue in India. While the existing data based on socio-economic surveys provide accurate estimates of food insecurity indicators at state and national level, due to small sample sizes, the surveys cannot be used directly to produce reliable estimates at the district or lower administrative levels. The availability of reliable and representative disaggregated measures of food insecurity is necessary for effective policy planning and monitoring, as food insecurity is often distributed unevenly within relatively small areas. This article explores a small area estimation (SAE) approach to derive reliable and representative estimates of food insecurity prevalence (FIP), gap (FIG), and severity (FIS) among people in different districts of the rural areas of the Eastern Indo-Gangetic Plain (EIGP) region by linking the latest round of available data from the Household Consumer Expenditure Survey collected by the National Sample Survey Office of India as well as the latest available Indian Population Census data. District-specific food insecurity indicators such as FIP, FIG, and FIS were estimated based on a recommended threshold of per capita caloric intake of 2400 kilocalories per day, as defined by the Ministry of Health and Family Welfare, Government of India. Spatial maps showing district-level inequality in the distribution of the indicators of food insecurity among the population in the EIGP region are also produced. Our disaggregated estimates can provide district-specific focused insights into food insecurity to policy analysts and decision-makers, and could thereby prove to be useful and relevant to the U.N. Sustainable Development Goal Indicator 2.1.2.


2018 ◽  
Vol 34 (2) ◽  
pp. 523-542 ◽  
Author(s):  
Thomas Zimmermann ◽  
Ralf Thomas Münnich

Abstract The demand for reliable business statistics at disaggregated levels, such as industry classes, increased considerably in recent years. Owing to small sample sizes for some of the domains, design-based methods may not provide estimates with adequate precision. Hence, modelbased small area estimation techniques that increase the effective sample size by borrowing strength are needed. Business data are frequently characterised by skewed distributions, with a few large enterprises that account for the majority of the total for the variable of interest, for example turnover. Moreover, the relationship between the variable of interest and the auxiliary variables is often non-linear on the original scale. In many cases, a lognormal mixed model provides a reasonable approximation of this relationship. In this article, we extend the empirical best prediction (EBP) approach to compensate for informative sampling, by incorporating design information among the covariates via an augmented modelling approach. This gives rise to the EBP under the augmented model. We propose to select the augmenting variable based on a joint assessment of a measure of predictive accuracy and a check of the normality assumptions. Finally, we compare our approach with alternatives in a model-based simulation study under different informative sampling mechanisms.


2018 ◽  
Author(s):  
Minh Cong Nguyen ◽  
Paul Corral ◽  
Joao Pedro Azevedo ◽  
Qinghua Zhao

Sign in / Sign up

Export Citation Format

Share Document