scholarly journals Estimating the RMSE of Small Area Estimates without the Tears

Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 931-942
Author(s):  
Diane Hindmarsh ◽  
David Steel

Small area estimation (SAE) methods can provide information that conventional direct survey estimation methods cannot. The use of small area estimates based on linear and generalized linear mixed models is still very limited, possibly because of the perceived complexity of estimating the root mean square errors (RMSEs) of the estimates. This paper outlines a study used to determine the conditions under which the estimated RMSEs, produced as part of statistical output (‘plug-in’ estimates of RMSEs) could be considered appropriate for a practical application of SAE methods where one of the main requirements was to use SAS software. We first show that the estimated RMSEs created using an EBLUP model in SAS and those obtained using a parametric bootstrap are similar to the published estimated RMSEs for the corn data in the seminal paper by Battese, Harter and Fuller. We then compare plug-in estimates of RMSEs from SAS procedures used to create EBLUP and EBP estimators against estimates of RMSEs obtained from a parametric bootstrap. For this comparison we created estimates of current smoking in males for 153 local government areas (LGAs) using data from the NSW Population Health Survey in Australia. Demographic variables from the survey data were included as covariates, with LGA-level population proportions, obtained mainly from the Australian Census used for prediction. For the EBLUP, the estimated plug-in estimates of RMSEs can be used, provided the sample size for the small area is more than seven. For the EBP, the plug-in estimates of RMSEs are suitable for all in-sample areas; out-of-sample areas need to use estimated RMSEs that use the parametric bootstrap.

2021 ◽  
Vol 2 (3) ◽  
pp. 1-15
Author(s):  
Cheng Wan ◽  
Andrew W. Mchill ◽  
Elizabeth B. Klerman ◽  
Akane Sano

Circadian rhythms influence multiple essential biological activities, including sleep, performance, and mood. The dim light melatonin onset (DLMO) is the gold standard for measuring human circadian phase (i.e., timing). The collection of DLMO is expensive and time consuming since multiple saliva or blood samples are required overnight in special conditions, and the samples must then be assayed for melatonin. Recently, several computational approaches have been designed for estimating DLMO. These methods collect daily sampled data (e.g., sleep onset/offset times) or frequently sampled data (e.g., light exposure/skin temperature/physical activity collected every minute) to train learning models for estimating DLMO. One limitation of these studies is that they only leverage one time-scale data. We propose a two-step framework for estimating DLMO using data from both time scales. The first step summarizes data from before the current day, whereas the second step combines this summary with frequently sampled data of the current day. We evaluate three moving average models that input sleep timing data as the first step and use recurrent neural network models as the second step. The results using data from 207 undergraduates show that our two-step model with two time-scale features has statistically significantly lower root-mean-square errors than models that use either daily sampled data or frequently sampled data.


Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 107
Author(s):  
Elisavet M. Sofikitou ◽  
Ray Liu ◽  
Huipei Wang ◽  
Marianthi Markatou

Pearson residuals aid the task of identifying model misspecification because they compare the estimated, using data, model with the model assumed under the null hypothesis. We present different formulations of the Pearson residual system that account for the measurement scale of the data and study their properties. We further concentrate on the case of mixed-scale data, that is, data measured in both categorical and interval scale. We study the asymptotic properties and the robustness of minimum disparity estimators obtained in the case of mixed-scale data and exemplify the performance of the methods via simulation.


2017 ◽  
Vol 10 (5) ◽  
pp. 662-686
Author(s):  
Dimitrios Staikos ◽  
Wenjun Xue

Purpose With this paper, the authors aim to investigate the drivers behind three of the most important aspects of the Chinese real estate market, housing prices, housing rent and new construction. At the same time, the authors perform a comprehensive empirical test of the popular 4-quadrant model by Wheaton and DiPasquale. Design/methodology/approach In this paper, the authors utilize panel cointegration estimation methods and data from 35 Chinese metropolitan areas. Findings The results indicate that the 4-quadrant model is well suited to explain the determinants of housing prices. However, the same is not true regarding housing rent and new construction suggesting a more complex theoretical framework may be required for a well-rounded explanation of real estate markets. Originality/value It is the first time that panel data are used to estimate rent and new construction for China. Also, it is the first time a comprehensive test of the Wheaton and DiPasquale 4-quadrant model is performed using data from China.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Anja Vinzelberg ◽  
Benjamin Rainer Auer

PurposeMotivated by the recent theoretical rehabilitation of mean-variance analysis, the authors revisit the question of whether minimum variance (MinVar) or maximum Sharpe ratio (MaxSR) investment weights are preferable in practical portfolio formation.Design/methodology/approachThe authors answer this question with a focus on mainstream investors which can be modeled by a preference for simple portfolio optimization techniques, a tendency to cling to past asset characteristics and a strong interest in index products. Specifically, in a rolling-window approach, the study compares the out-of-sample performance of MinVar and MaxSR portfolios in two asset universes covering multiple asset classes (via investable indices and their subindices) and for two popular input estimation methods (full covariance and single-index model).FindingsThe authors find that, regardless of the setting, there is no statistically significant difference between MinVar and MaxSR portfolio performance. Thus, the choice of approach does not matter for mainstream investors. In addition, the analysis reveals that, contrary to previous research, using a single-index model does not necessarily improve out-of-sample Sharpe ratios.Originality/valueThe study is the first to provide an in-depth comparison of MinVar and MaxSR returns which considers (1) multiple asset classes, (2) a single-index model and (3) state-of-the-art bootstrap performance tests.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Nhu-Ty Nguyen ◽  
Thanh-Tuyen Tran

Inflation is a key element of a national economy, and it is also a prominent and important issue influencing the whole economy in terms of marketing. This is a complex problem requiring a large investment of time and wisdom to attain positive results. Thus, appropriate tools for forecasting inflation variables are crucial significant for policy making. In this study, both clarified value calculation and use of a genetic algorithm to find the optimal parameters are adopted simultaneously to construct improved models: ARIMA, GM(1,1), Verhulst, DGM(1,1), and DGM(2,1) by using data of Vietnamese inflation output from January 2005 to November 2013. The MAPE, MSE, RMSE, and MAD are four criteria with which the various forecasting models results are compared. Moreover, to see whether differences exist, Friedman and Wilcoxon tests are applied. Both in-sample and out-of-sample forecast performance results show that the ARIMA model has highly accurate forecasting in Raw Materials Price (RMP) and Gold Price (GP), whereas, the calculated results of GM(1,1) and DGM(1,1) are suitable to forecast Consumer Price Index (CPI). Therefore, the ARIMA, GM(1,1), and DGM(1,1) can handle the forecast accuracy of the issue, and they are suitable in modeling and forecasting of inflation in the case of Vietnam.


2019 ◽  
Vol 53 (1) ◽  
pp. 45-61
Author(s):  
Mossamet Kamrun Nesa

National level indicators of child undernutrition often hide the real scenario across a country. In order to construct a child nutrition map, accurate estimates of undernutrition are required at very small spatial scales, typically the administrative units of a country or a region within a country. Although comprehensive data on child nutrition are collected in national surveys, the small scale estimates cannot be calculated using the standard estimation methods employed in national surveys, since such methods are designed to produce national or regional level estimates, and assume large samples. Small area estimation method has been widely used to find such micro-level estimates. Due to lack of unit level data, area level small area estimation methods (e.g., Fay-Herriot method) are widely used to calculate small-scale estimates. In Bangladesh, a few works have been done to estimate district level child nutrition status. The Bangladesh Demographic Health Survey covers all districts but district wise sample sizes are very small to get consistent estimates. In this paper, Fay-Herriot Model has been developed to calculate district wise estimates with efficient mean squared error. The Bangladesh Demographic Health Survey 2011 and Population Census 2011 are utilized for this study.


2014 ◽  
Vol 11 (1) ◽  
pp. 309-320
Author(s):  
Hui Tian ◽  
Yingpeng Sang ◽  
Hong Shen ◽  
Chunyue Zhou

Traffic matrix is of great help in many network applications. However, it is very difficult to estimate the traffic matrix for a large-scale network. This is because the estimation problem from limited link measurements is highly underconstrained. We propose a simple probability model for a large-scale practical network. The probability model is then generalized to a general model by including random traffic data. Traffic matrix estimation is then conducted under these two models by two minimization methods. It is shown that the Normalized Root Mean Square Errors of these estimates under our model assumption are very small. For a large-scale network, the traffic matrix estimation methods also perform well. The comparison of two minimization methods shown in the simulation results complies with the analysis.


2011 ◽  
Vol 11 ◽  
pp. 1812-1820 ◽  
Author(s):  
Karl Peltzer ◽  
Supa Pengpid

Adolescent sexuality is a relevant public health issue, as it affects risk to contract HIV and other sexually transmitted infections. The assessment of prevalence of sexual intercourse among adolescents may guide policies and programmes aimed at reducing the transmission of sexually transmitted infections among this age group. Using data from the Thailand Global School-Based Student Health Survey (GSHS) 2008, we assessed the prevalence of sexual intercourse in the last 12 months and its associated factors among adolescents (). Overall the prevalence of sexual intercourse in the past 12 months was 11.0% (14.6% males and 7.6% females). Variables positively associated with the outcome in multivariable analysis were male gender (; 95% CI 1.14–242), older age, ≥15 years (, 1.80–3.74), current alcohol use (, 1.46–3.36), psychosocial distress (, 1.44–3.09) and among females current smoking (, 1.62–18.48), lifetime drug use (, 1.04–18.3) and lack of parental or guardian bonding (, 0.27–0.97). Efforts to control unhealthy lifestyles (substance use) and psychosocial distress may impact on adolescents' sexual activity.


1998 ◽  
Vol 30 (5) ◽  
pp. 785-816 ◽  
Author(s):  
P Williamson ◽  
M Birkin ◽  
P H Rees

Census data can be represented both as lists and as tabulations of household/individual attributes. List representation of Census data offers greater flexibility, as the exploration of interrelationships between population characteristics is limited only by the quality and scope of the data collected. Unfortunately, the released lists of household/individual attributes (Samples of Anonymised Records, SARs) are spatially referenced only to areas (single or merged districts) with populations of 120 000 or more, whereas released tabulations are available for units as small as single enumeration districts (Small Area Statistics, SAS). Intuitively, it should be possible to derive list-based estimates of enumeration district populations by combining information contained in the SAR and the SAS. In this paper we explore the range of solutions that could be adapted to this problem which, ultimately, is presented as a complex combinatorial optimisation problem. Various techniques of combinatorial optimisation are tested, and preliminary results from the best performing algorithm are evaluated. Through this process, the lack of suitable test statistics for the comparison of observed and expected tabulations of population data is highlighted.


Sign in / Sign up

Export Citation Format

Share Document