scholarly journals Robust Estimation of the Theil Index and the Gini Coeffient for Small Areas

2021 ◽  
Vol 37 (4) ◽  
pp. 955-979
Author(s):  
Stefano Marchetti ◽  
Nikos Tzavidis

Abstract Small area estimation is receiving considerable attention due to the high demand for small area statistics. Small area estimators of means and totals have been widely studied in the literature. Moreover, in the last years also small area estimators of quantiles and poverty indicators have been studied. In contrast, small area estimators of inequality indicators, which are often used in socio-economic studies, have received less attention. In this article, we propose a robust method based on the M-quantile regression model for small area estimation of the Theil index and the Gini coefficient, two popular inequality measures. To estimate the mean squared error a non-parametric bootstrap is adopted. A robust approach is used because often inequality is measured using income or consumption data, which are often non-normal and affected by outliers. The proposed methodology is applied to income data to estimate the Theil index and the Gini coefficient for small domains in Tuscany (provinces by age groups), using survey and Census micro-data as auxiliary variables. In addition, a design-based simulation is carried out to study the behaviour of the proposed robust estimators. The performance of the bootstrap mean squared error estimator is also investigated in the simulation study.

2021 ◽  
Vol 5 (1) ◽  
pp. 50-60
Author(s):  
Naima Rakhsyanda ◽  
Kusman Sadik ◽  
Indahwati Indahwati

Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.


Author(s):  
Takumi Saegusa ◽  
Shonosuke Sugasawa ◽  
Partha Lahiri

Abstract Various multivariate extensions to the well-known Fay–Herriot model have been proposed in the small area estimation literature. Such multivariate models are quite effective in combining information through correlations among small area survey estimates of related variables or historical survey estimates of the same variable or both. Though the literature on small area estimation is already very rich, construction of second-order efficient confidence intervals from multivariate models has received little attention. In this article, we develop a parametric bootstrap method for constructing a second-order efficient confidence interval for a general linear combination of small area means using the multivariate Fay–Herriot normal model. The proposed parametric bootstrap method replaces difficult and tedious analytical derivations by the power of efficient algorithm and high speed computer. Moreover, the proposed method is more versatile than the analytical method because the parametric bootstrap method can be easily applied to any method of model parameter estimation and any specific structure of the variance–covariance matrix of the multivariate Fay–Herriot model avoiding all the cumbersome and time-consuming calculations required in the analytical method. We apply our proposed methodology in constructing confidence intervals for the median income of four-person families for the fifty states and the District of Columbia in the United States. Our data analysis demonstrates that the proposed parametric bootstrap method, applied to both multivariate and univariate Fay–Herriot models, generally provides much shorter confidence intervals compared to the corresponding traditional direct method. Moreover, the confidence intervals obtained from the multivariate model are generally shorter than the corresponding intervals from the univariate model indicating the potential advantage of exploiting correlations of median income of four-person families with median incomes of three- and five-person families.


Author(s):  
J. Iseh Matthew ◽  
J. Bassey Kufre

This paper considered the challenges of population mean estimation in small area that is characterized by small or no sample size in the presence of unit nonresponse and presents a calibration estimator that produces reliable estimates under stratified random sampling from a class of synthetic estimators using calibration approach with alternative distance measure. Examining the proposed estimator relatively with existing ones under three distributional assumptions: normal, gamma, and exponential distributions with percent average absolute relative bias, percent average coefficient of variation, and average mean squared error as evaluation criteria using simulation analysis technique, the new estimator exhibited a more reliable estimate of the mean with less bias and greater gain in efficiency. Further evaluation using coefficient of variation under varying nonresponse rates to validate the results of variations suggests that the estimator is a suitable alternative for small area estimation. This finding has therefore contributed to the development of an ultimate estimator for small area estimation in the presence of unit nonresponse.


2017 ◽  
Vol 47 (12) ◽  
pp. 1577-1589 ◽  
Author(s):  
Neil R. Ver Planck ◽  
Andrew O. Finley ◽  
Emily S. Huff

The National Woodland Owner Survey (NWOS), administered by the USDA Forest Service, provides estimates of private forest ownership characteristics and owners’ attitudes and behaviors at a national, regional, and state levels. Due to sample sizes prescribed for inference at the state level, there are insufficient data to support county-level estimates. However, county-level estimates of NWOS variables are desired because ownership programs and education initiatives often occur at the county level and such information could help tailor these efforts to better match county-specific needs and demographics. Here, we present and assess methods to estimate the number of private forest ownerships at the county level for two states, Montana and New Jersey. To assess model performance, true population parameters were derived from cadastral and remote sensing data. Two small area estimation (SAE) models, the Fay-Herriot (FH) and the FH with conditional autoregressive random effects (FHCAR), improved estimated county-level population mean squared error (MSE) over that achieved by direct estimates. The proposed SAE models use covariates to improve accuracy and precision of county-level estimates. Results show total forest area, and 2010 decennial census population density covariates explained a significant portion of variability in county-level population size. These and other results suggest that the proposed SAE methods yield a statistically robust approach to deliver reliable estimates of private ownership population size and could be extended to additional important NWOS variables at the county level.


Author(s):  
Jan Pablo Burgard ◽  
Domingo Morales ◽  
Anna-Lena Wölwer

AbstractSocioeconomic indicators play a crucial role in monitoring political actions over time and across regions. Income-based indicators such as the median income of sub-populations can provide information on the impact of measures, e.g., on poverty reduction. Regional information is usually published on an aggregated level. Due to small sample sizes, these regional aggregates are often associated with large standard errors or are missing if the region is unsampled or the estimate is simply not published. For example, if the median income of Hispanic or Latino Americans from the American Community Survey is of interest, some county-year combinations are not available. Therefore, a comparison of different counties or time-points is partly not possible. We propose a new predictor based on small area estimation techniques for aggregated data and bivariate modeling. This predictor provides empirical best predictions for the partially unavailable county-year combinations. We provide an analytical approximation to the mean squared error. The theoretical findings are backed up by a large-scale simulation study. Finally, we return to the problem of estimating the county-year estimates for the median income of Hispanic or Latino Americans and externally validate the estimates.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2780
Author(s):  
Paul Corral ◽  
Kristen Himelein ◽  
Kevin McGee ◽  
Isabel Molina

This paper evaluates the performance of different small area estimation methods using model and design-based simulation experiments. Design-based simulation experiments are carried out using the Mexican Intra Censal survey as a census of roughly 3.9 million households from which 500 samples are drawn using a two-stage selection procedure similar to that of Living Standards Measurement Study (LSMS) surveys. The estimation methods considered are that of Elbers, Lanjouw and Lanjouw (2003), the empirical best predictor of Molina and Rao (2010), the twofold nested error extension presented by Marhuenda et al. (2017), and finally an adaptation, presented by Nguyen (2012), that combines unit and area level information, and which has been proposed as an alternative when the available census data is outdated. The findings show the importance of selecting a proper model and data transformation so that model assumptions hold. A proper data transformation can lead to a considerable improvement in mean squared error (MSE). Results from design-based validation show that all small area estimation methods represent an improvement, in terms of MSE, over direct estimates. However, methods that model unit level welfare using only area level information suffer from considerable bias. Because the magnitude and direction of the bias is unknown ex ante, methods relying only on aggregated covariates should be used with caution, but may be an alternative to traditional area level models when these are not applicable.


2019 ◽  
Author(s):  
David Buil-Gil ◽  
Reka Solymosi ◽  
Angelo Moretti

Open and crowdsourced data are becoming prominent in social sciences research. Crowdsourcing projects harness information from large crowds of citizens who voluntarily participate into one collaborative project, and allow new insights into people’s attitudes and perceptions. However, these are usually affected by a series of biases that limit their representativeness (i.e. self-selection bias, unequal participation, underrepresentation of certain areas and times). In this chapter we present a two-step method aimed to produce reliable small area estimates from crowdsourced data when no auxiliary information is available at the individual level. A non-parametric bootstrap, aimed to compute pseudosampling weights and bootstrap weighted estimates, is followed by an area-level model based small area estimation approach, which borrows strength from related areas based on a set of covariates, to improve the small area estimates. In order to assess the method, a simulation study and an application to safety perceptions in Greater London are conducted. The simulation study shows that the area-level model-based small area estimator under the non-parametric bootstrap improves (in terms of bias and variability) the small area estimates in the majority of areas. The application produces estimates of safety perceptions at a small geographical level in Greater London from Place Pulse 2.0 data. In the application, estimates are validated externally by comparing these to reliable survey estimates. Further simulation experiments and applications are needed to examine whether this method also improves the small area estimates when the sample biases are larger, smaller or show different distributions. A measure of reliability also needs to be developed to estimate the error of the small area estimates under the non-parametric bootstrap.


Author(s):  
Anggun Permatasari ◽  
Khairil Anwar Notodiputro ◽  
Erfiani *

Small area estimation (SAE) is an important alternative method to obtain information in a small area when the sample size is small. In this paper, we proposed a parametric bootstrap method to estimate mean square error (MSE) of proportion based on area unit levels. The purpose of this research has been focused on applying the parametric bootstrap method to estimate MSE in SAE for zero inflated binomial models (SAE ZIB). The results showed that the bootstrap method produced a smaller MSE than the direct estimation, implying that the SAE ZIB performs better when compared to the direct estimation


Sign in / Sign up

Export Citation Format

Share Document