scholarly journals Small area estimation of socioeconomic indicators for sampled and unsampled domains

Author(s):  
Jan Pablo Burgard ◽  
Domingo Morales ◽  
Anna-Lena Wölwer

AbstractSocioeconomic indicators play a crucial role in monitoring political actions over time and across regions. Income-based indicators such as the median income of sub-populations can provide information on the impact of measures, e.g., on poverty reduction. Regional information is usually published on an aggregated level. Due to small sample sizes, these regional aggregates are often associated with large standard errors or are missing if the region is unsampled or the estimate is simply not published. For example, if the median income of Hispanic or Latino Americans from the American Community Survey is of interest, some county-year combinations are not available. Therefore, a comparison of different counties or time-points is partly not possible. We propose a new predictor based on small area estimation techniques for aggregated data and bivariate modeling. This predictor provides empirical best predictions for the partially unavailable county-year combinations. We provide an analytical approximation to the mean squared error. The theoretical findings are backed up by a large-scale simulation study. Finally, we return to the problem of estimating the county-year estimates for the median income of Hispanic or Latino Americans and externally validate the estimates.

2021 ◽  
Vol 5 (1) ◽  
pp. 50-60
Author(s):  
Naima Rakhsyanda ◽  
Kusman Sadik ◽  
Indahwati Indahwati

Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.


Author(s):  
Benmei Liu ◽  
Isaac Dompreh ◽  
Anne M Hartman

Abstract Background The workplace and home are sources of exposure to secondhand smoke (SHS), a serious health hazard for nonsmoking adults and children. Smoke-free workplace policies and home rules protect nonsmoking individuals from SHS and help individuals who smoke to quit smoking. However, estimated population coverages of smoke-free workplace policies and home rules are not typically available at small geographic levels such as counties. Model-based small area estimation techniques are needed to produce such estimates. Methods Self-reported smoke-free workplace policies and home rules data came from the 2014-2015 Tobacco Use Supplement to the Current Population Survey. County-level design-based estimates of the two measures were computed and linked to county-level relevant covariates obtained from external sources. Hierarchical Bayesian models were then built and implemented through Markov Chain Monte Carlo methods. Results Model-based estimates of smoke-free workplace policies and home rules were produced for 3,134 (out of 3,143) U.S. counties. In 2014-2015, nearly 80% of U.S. adult workers were covered by smoke-free workplace policies, and more than 85% of U.S. adults were covered by smoke-free home rules. We found large variations within and between states in the coverage of smoke-free workplace policies and home rules. Conclusions The small-area modeling approach efficiently reduced the variability that was attributable to small sample size in the direct estimates for counties with data and predicted estimates for counties without data by borrowing strength from covariates and other counties with similar profiles. The county-level modeled estimates can serve as a useful resource for tobacco control research and intervention. Implications Detailed county- and state-level estimates of smoke-free workplace policies and home rules can help identify coverage disparities and differential impact of smoke-free legislation and related social norms. Moreover, this estimation framework can be useful for modeling different tobacco control variables and applied elsewhere, e.g., to other behavioral, policy, or health related topics.


2017 ◽  
Vol 18 (1) ◽  
pp. 1
Author(s):  
Frida Murtinasari ◽  
Alfian Futuhul Hadi ◽  
Dian Anggraeni

SAE (Small Area Estimation) is often used by researchers, especially statisticians to estimate parameters of a subpopulation which has a small sample size. Empirical Best Linear Unbiased Prediction (EBLUP) is one of the indirect estimation methods in Small Area Estimation. The presence of outliers in the data can not guarantee that these methods yield precise predictions . Robust regression is one approach that is used in the model Small Area Estimation. Robust approach in estimating such a small area known as the Robust Small Area Estimation. Robust Small Area Estimation divided into several approaches. It calls Maximum Likelihood and M- Estimation. From the result, Robust Small Area Estimation with M-Estimation has the smallest RMSE than others. The value is 1473.7 (with outliers) and 1279.6 (without outlier). In addition the research also indicated that REBLUP with M-Estimation more robust to outliers. It causes the RMSE value with EBLUP has five times to be large with only one outlier are included in the data analysis. As for the REBLUP method is relatively more stable RMSE results.


Author(s):  
Takumi Saegusa ◽  
Shonosuke Sugasawa ◽  
Partha Lahiri

Abstract Various multivariate extensions to the well-known Fay–Herriot model have been proposed in the small area estimation literature. Such multivariate models are quite effective in combining information through correlations among small area survey estimates of related variables or historical survey estimates of the same variable or both. Though the literature on small area estimation is already very rich, construction of second-order efficient confidence intervals from multivariate models has received little attention. In this article, we develop a parametric bootstrap method for constructing a second-order efficient confidence interval for a general linear combination of small area means using the multivariate Fay–Herriot normal model. The proposed parametric bootstrap method replaces difficult and tedious analytical derivations by the power of efficient algorithm and high speed computer. Moreover, the proposed method is more versatile than the analytical method because the parametric bootstrap method can be easily applied to any method of model parameter estimation and any specific structure of the variance–covariance matrix of the multivariate Fay–Herriot model avoiding all the cumbersome and time-consuming calculations required in the analytical method. We apply our proposed methodology in constructing confidence intervals for the median income of four-person families for the fifty states and the District of Columbia in the United States. Our data analysis demonstrates that the proposed parametric bootstrap method, applied to both multivariate and univariate Fay–Herriot models, generally provides much shorter confidence intervals compared to the corresponding traditional direct method. Moreover, the confidence intervals obtained from the multivariate model are generally shorter than the corresponding intervals from the univariate model indicating the potential advantage of exploiting correlations of median income of four-person families with median incomes of three- and five-person families.


Author(s):  
J. Iseh Matthew ◽  
J. Bassey Kufre

This paper considered the challenges of population mean estimation in small area that is characterized by small or no sample size in the presence of unit nonresponse and presents a calibration estimator that produces reliable estimates under stratified random sampling from a class of synthetic estimators using calibration approach with alternative distance measure. Examining the proposed estimator relatively with existing ones under three distributional assumptions: normal, gamma, and exponential distributions with percent average absolute relative bias, percent average coefficient of variation, and average mean squared error as evaluation criteria using simulation analysis technique, the new estimator exhibited a more reliable estimate of the mean with less bias and greater gain in efficiency. Further evaluation using coefficient of variation under varying nonresponse rates to validate the results of variations suggests that the estimator is a suitable alternative for small area estimation. This finding has therefore contributed to the development of an ultimate estimator for small area estimation in the presence of unit nonresponse.


2017 ◽  
Vol 47 (12) ◽  
pp. 1577-1589 ◽  
Author(s):  
Neil R. Ver Planck ◽  
Andrew O. Finley ◽  
Emily S. Huff

The National Woodland Owner Survey (NWOS), administered by the USDA Forest Service, provides estimates of private forest ownership characteristics and owners’ attitudes and behaviors at a national, regional, and state levels. Due to sample sizes prescribed for inference at the state level, there are insufficient data to support county-level estimates. However, county-level estimates of NWOS variables are desired because ownership programs and education initiatives often occur at the county level and such information could help tailor these efforts to better match county-specific needs and demographics. Here, we present and assess methods to estimate the number of private forest ownerships at the county level for two states, Montana and New Jersey. To assess model performance, true population parameters were derived from cadastral and remote sensing data. Two small area estimation (SAE) models, the Fay-Herriot (FH) and the FH with conditional autoregressive random effects (FHCAR), improved estimated county-level population mean squared error (MSE) over that achieved by direct estimates. The proposed SAE models use covariates to improve accuracy and precision of county-level estimates. Results show total forest area, and 2010 decennial census population density covariates explained a significant portion of variability in county-level population size. These and other results suggest that the proposed SAE methods yield a statistically robust approach to deliver reliable estimates of private ownership population size and could be extended to additional important NWOS variables at the county level.


2021 ◽  
Vol 37 (4) ◽  
pp. 955-979
Author(s):  
Stefano Marchetti ◽  
Nikos Tzavidis

Abstract Small area estimation is receiving considerable attention due to the high demand for small area statistics. Small area estimators of means and totals have been widely studied in the literature. Moreover, in the last years also small area estimators of quantiles and poverty indicators have been studied. In contrast, small area estimators of inequality indicators, which are often used in socio-economic studies, have received less attention. In this article, we propose a robust method based on the M-quantile regression model for small area estimation of the Theil index and the Gini coefficient, two popular inequality measures. To estimate the mean squared error a non-parametric bootstrap is adopted. A robust approach is used because often inequality is measured using income or consumption data, which are often non-normal and affected by outliers. The proposed methodology is applied to income data to estimate the Theil index and the Gini coefficient for small domains in Tuscany (provinces by age groups), using survey and Census micro-data as auxiliary variables. In addition, a design-based simulation is carried out to study the behaviour of the proposed robust estimators. The performance of the bootstrap mean squared error estimator is also investigated in the simulation study.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2780
Author(s):  
Paul Corral ◽  
Kristen Himelein ◽  
Kevin McGee ◽  
Isabel Molina

This paper evaluates the performance of different small area estimation methods using model and design-based simulation experiments. Design-based simulation experiments are carried out using the Mexican Intra Censal survey as a census of roughly 3.9 million households from which 500 samples are drawn using a two-stage selection procedure similar to that of Living Standards Measurement Study (LSMS) surveys. The estimation methods considered are that of Elbers, Lanjouw and Lanjouw (2003), the empirical best predictor of Molina and Rao (2010), the twofold nested error extension presented by Marhuenda et al. (2017), and finally an adaptation, presented by Nguyen (2012), that combines unit and area level information, and which has been proposed as an alternative when the available census data is outdated. The findings show the importance of selecting a proper model and data transformation so that model assumptions hold. A proper data transformation can lead to a considerable improvement in mean squared error (MSE). Results from design-based validation show that all small area estimation methods represent an improvement, in terms of MSE, over direct estimates. However, methods that model unit level welfare using only area level information suffer from considerable bias. Because the magnitude and direction of the bias is unknown ex ante, methods relying only on aggregated covariates should be used with caution, but may be an alternative to traditional area level models when these are not applicable.


2017 ◽  
Vol 43 (2) ◽  
pp. 182-224
Author(s):  
Wendy Chan

Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score–based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods such as subclassification by propensity score, where limited sample sizes lead to sparse strata. This article explores the potential of small area estimation methods to improve the precision of estimators in sparse strata using population data as a source of auxiliary information to borrow strength. Results from simulation studies identify the conditions under which small area estimators outperform conventional estimators and the limitations of this application to causal generalization studies.


2020 ◽  
Vol 7 (1) ◽  
pp. 337-360
Author(s):  
Jiming Jiang ◽  
J. Sunil Rao

A small area typically refers to a subpopulation or domain of interest for which a reliable direct estimate, based only on the domain-specific sample, cannot be produced due to small sample size in the domain. While traditional small area methods and models are widely used nowadays, there have also been much work and interest in robust statistical inference for small area estimation (SAE). We survey this work and provide a comprehensive review here. We begin with a brief review of the traditional SAE methods. We then discuss SAE methods that are developed under weaker assumptions and SAE methods that are robust in certain ways, such as in terms of outliers or model failure. Our discussion also includes topics such as nonparametric SAE methods, Bayesian approaches, model selection and diagnostics, and missing data. A brief review of software packages available for implementing robust SAE methods is also given.


Sign in / Sign up

Export Citation Format

Share Document