Measuring the Accuracy of Aggregates Computed from a Statistical Register

Abstract The Italian National Statistical Institute (Istat) is currently engaged in a modernization programme that foresees a significant revision of the methods traditionally used for the production of official statistics. The main concept behind this transformation is the use of the Integrated System Statistical Registers, created by a massive integration of administrative archives and survey data. In this article, we focus on how to measure the accuracy of register estimates of a population total from measurements calculated at the unit level. We propose the global mean squared error (GMSE) as a statistical quantity suitable for measuring accuracy in the context of the production of official statistics. It can be defined to explicitly consider the main sources of uncertainty that may affect registers. The article suggests a feasible calculation strategy for the GMSE that allows National Statistical Institutes to build algorithms that can promptly be applied for each user request, thus improving the relevance, transparency and confidence of official statistics. Through a simulation study, we verified the efficacy of the proposed strategy.

Download Full-text

A new generic method to improve machine learning applications in official statistics

Statistical Journal of the IAOS ◽

10.3233/sji-210885 ◽

2021 ◽

pp. 1-16

Author(s):

Kevin Kloos

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Statistical Properties ◽

Machine Learning Algorithms ◽

Official Statistics ◽

Academic Literature ◽

Misclassification Bias ◽

Squared Error ◽

Machine Learning Applications ◽

Applications Of Machine Learning

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

Download Full-text

A Statistical Approach to Model Selection for Dynamic Adsorption Columns

Advances in Wastewater Treatment II - Materials Research Foundations ◽

10.21741/9781644901397-5 ◽

2021 ◽

pp. 128-167

Author(s):

P. Musonge

Keyword(s):

Mean Squared Error ◽

Scale Up ◽

Absolute Error ◽

Breakthrough Curves ◽

Dynamic Adsorption ◽

Mean Values ◽

Squared Error ◽

Local Mean ◽

Global Mean ◽

Adsorption Systems

A variety of models have been used to describe and predict breakthrough curves for dynamic adsorption systems, in order to scale up laboratory and pilot plant systems. There are however limitations in the applicability of existing models. The study is aimed at providing unambiguous approaches in selecting the best performing model between Thomas, Yoon-Nelson and Bohart-Adams (B-A) models for three dynamic adsorption systems. Three approaches were implemented in this study using published experimental data of three adsorption systems. The first approach was the application of statistical analysis between actual and predicted breakthrough curves without modifying the models. The second and third approaches were application of local mean values (LMV) and global mean values (GMV) of empirical constants to predict breakthrough curves. Predictive and generalization performances of the three models were evaluated using the statistical criteria of Mean Absolute Error (MAE), Root mean Squared Error (RMSE) and Correlation Coefficient (R2).

Download Full-text

Effectively Monitoring the Performance of Integrated Process Control Systems under Nonstationary Disturbances

International Journal of Quality Statistics and Reliability ◽

10.1155/2010/180293 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Karin Kandananond

Keyword(s):

Process Control ◽

Control Charts ◽

Process Model ◽

Mean Squared Error ◽

Average Run Length ◽

Integrated System ◽

Automatic Process ◽

Statistical Process ◽

Minimum Mean Squared Error ◽

Squared Error

The objective of this paper is to quantify the effect of autocorrelation coefficients, shift magnitude, types of control charts, types of controllers, and types of monitored signals on a control system. Statistical process control (SPC) and automatic process control (APC) were studied under non-stationary stochastic disturbances characterized by the integrated moving average model, ARIMA (0,1,1). A process model was simulated to achieve two responses, mean squared error (MSE) and average run length (ARL). A factorial design experiment was conducted to analyze the simulated results. The results revealed that not only shift magnitude and the level of autocorrelation coefficients, but also the interaction between these two factors, affected the integrated system performance. It was also found that the most appropriate combination of SPC and APC is the utilization of the minimum mean squared error (MMSE) controller with the Shewhart moving range (MR) chart, while monitoring the control signal (X) from the controller. Therefore, integrating SPC and APC can improve process manufacturing, but the performance of the integrated system is significantly affected by process autocorrelation. Therefore, if the performance of the integrated system under non-stationary disturbances is correctly characterized, practitioners will have guidelines for achieving the highest possible performance potential when integrating SPC and APC.

Download Full-text

Use of reflectance spectroscopy to estimate the organic carbon and CaCO3 contents of soils

Agrokémia és Talajtan ◽

10.1556/agrokem.60.2012.2.5 ◽

2012 ◽

Vol 61 (2) ◽

pp. 277-290 ◽

Cited By ~ 1

Author(s):

Ádám Csorba ◽

Vince Láng ◽

László Fenyvesi ◽

Erika Michéli

Keyword(s):

Organic Carbon ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Mean Squared Error ◽

Reflectance Spectroscopy ◽

Least Squares Regression ◽

Root Mean Squared Error ◽

Squared Error

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.

Download Full-text

Minimax Mean-Squared Error Location Estimation Using TOA Measurements

IEICE Transactions on Communications ◽

10.1587/transcom.e93.b.2223 ◽

2010 ◽

Vol E93-B (8) ◽

pp. 2223-2225 ◽

Cited By ~ 2

Author(s):

Chih-Chang SHEN ◽

Ann-Chen CHANG

Keyword(s):

Mean Squared Error ◽

Location Estimation ◽

Squared Error ◽

Error Location

Download Full-text

Using Approximation Non-Bayesian Computation with Fuzzy Data to Estimation Inverse Weibull Parameters and Reliability Function

Ibn AL- Haitham Journal For Pure and Applied Science ◽

10.30526/2017.ihsciconf.1811 ◽

2018 ◽

pp. 397

Author(s):

Nadia Hashim Al-Noor ◽

Shurooq A.K. Al-Sultany

Keyword(s):

Maximum Likelihood ◽

Expectation Maximization ◽

Mean Squared Error ◽

Maximum Likelihood Estimators ◽

Reliability Function ◽

Fuzzy Data ◽

Monte Carlo Simulation Study ◽

Weibull Parameters ◽

Squared Error ◽

Newton Raphson

In real situations all observations and measurements are not exact numbers but more or less non-exact, also called fuzzy. So, in this paper, we use approximate non-Bayesian computational methods to estimate inverse Weibull parameters and reliability function with fuzzy data. The maximum likelihood and moment estimations are obtained as non-Bayesian estimation. The maximum likelihood estimators have been derived numerically based on two iterative techniques namely “Newton-Raphson” and the “Expectation-Maximization” techniques. In addition, we provide compared numerically through Monte-Carlo simulation study to obtained estimates of the parameters and reliability function in terms of their mean squared error values and integrated mean squared error values respectively.

Download Full-text

Image Quality Enhancing by Efficient Histogram Equalization

Wasit Journal of Engineering Sciences ◽

10.31185/ejuow.vol2.iss2.29 ◽

2014 ◽

Vol 2 (2) ◽

pp. 47-58

Author(s):

Ismail Sh. Baqer

Keyword(s):

Image Quality ◽

Contrast Enhancement ◽

Mean Squared Error ◽

Signal To Noise Ratio ◽

Histogram Equalization ◽

Gray Level ◽

Signal To Noise ◽

Squared Error ◽

Noise Ratio

A two Level Image Quality enhancement is proposed in this paper. In the first level, Dualistic Sub-Image Histogram Equalization DSIHE method decomposes the original image into two sub-images based on median of original images. The second level deals with spikes shaped noise that may appear in the image after processing. We presents three methods of image enhancement GHE, LHE and proposed DSIHE that improve the visual quality of images. A comparative calculations is being carried out on above mentioned techniques to examine objective and subjective image quality parameters e.g. Peak Signal-to-Noise Ratio PSNR values, entropy H and mean squared error MSE to measure the quality of gray scale enhanced images. For handling gray-level images, convenient Histogram Equalization methods e.g. GHE and LHE tend to change the mean brightness of an image to middle level of the gray-level range limiting their appropriateness for contrast enhancement in consumer electronics such as TV monitors. The DSIHE methods seem to overcome this disadvantage as they tend to preserve both, the brightness and contrast enhancement. Experimental results show that the proposed technique gives better results in terms of Discrete Entropy, Signal to Noise ratio and Mean Squared Error values than the Global and Local histogram-based equalization methods

Download Full-text

Thematic Maps for the Variation of Bearing Capacity of Soil Using SPTs and MATLAB

Geosciences ◽

10.3390/geosciences10090329 ◽

2020 ◽

Vol 10 (9) ◽

pp. 329

Author(s):

Mahdi O. Karkush ◽

Mahmood D. Ahmed ◽

Ammar Abdul-Hassan Sheikha ◽

Ayad Al-Rumaithi

Keyword(s):

Bearing Capacity ◽

Mean Squared Error ◽

Ground Level ◽

The Other ◽

Thematic Maps ◽

First Order ◽

Squared Error ◽

Order Polynomial ◽

Interpolation Polynomials ◽

Penetration Tests

The current study involves placing 135 boreholes drilled to a depth of 10 m below the existing ground level. Three standard penetration tests (SPT) are performed at depths of 1.5, 6, and 9.5 m for each borehole. To produce thematic maps with coordinates and depths for the bearing capacity variation of the soil, a numerical analysis was conducted using MATLAB software. Despite several-order interpolation polynomials being used to estimate the bearing capacity of soil, the first-order polynomial was the best among the other trials due to its simplicity and fast calculations. Additionally, the root mean squared error (RMSE) was almost the same for the all of the tried models. The results of the study can be summarized by the production of thematic maps showing the variation of the bearing capacity of the soil over the whole area of Al-Basrah city correlated with several depths. The bearing capacity of soil obtained from the suggested first-order polynomial matches well with those calculated from the results of SPTs with a deviation of ±30% at a 95% confidence interval.

Download Full-text

Industrial Control under Non-Ideal Measurements: Data-Based Signal Processing as an Alternative to Controller Retuning

Sensors ◽

10.3390/s21041237 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1237

Author(s):

Ivan Pisa ◽

Antoni Morell ◽

Ramón Vilanova ◽

Jose Lopez Vicario

Keyword(s):

Mean Squared Error ◽

Control Strategies ◽

Treatment Plant ◽

Industrial Control ◽

Output Data ◽

Complex Processes ◽

Squared Error ◽

Complex Design ◽

Solution Design ◽

Delay Correction

Industrial environments are characterised by the non-lineal and highly complex processes they perform. Different control strategies are considered to assure that these processes are correctly performed. Nevertheless, these strategies are sensible to noise-corrupted and delayed measurements. For that reason, denoising techniques and delay correction methodologies should be considered but, most of these techniques require a complex design and optimisation process as a function of the scenario where they are applied. To alleviate this, a complete data-based approach devoted to denoising and correcting the delay of measurements is proposed here with a two-fold objective: simplify the solution design process and achieve its decoupling from the considered control strategy as well as from the scenario. Here it corresponds to a Wastewater Treatment Plant (WWTP). However, the proposed solution can be adopted at any industrial environment since neither an optimization nor a design focused on the scenario is required, only pairs of input and output data. Results show that a minimum Root Mean Squared Error (RMSE) improvement of a 63.87% is achieved when the new proposed data-based denoising approach is considered. In addition, the whole system performance show that similar and even better results are obtained when compared to scenario-optimised methodologies.

Download Full-text

A Novel Computational Intelligence Approach for Coal Consumption Forecasting in Iran

Sustainability ◽

10.3390/su13147612 ◽

2021 ◽

Vol 13 (14) ◽

pp. 7612

Author(s):

Mahdis sadat Jalaee ◽

Alireza Shakibaei ◽

Amin GhasemiNejad ◽

Sayyed Abdolmajid Jalaee ◽

Reza Derakhshani

Keyword(s):

Hybrid Method ◽

Computational Intelligence ◽

Energy Demand ◽

Mean Squared Error ◽

Absolute Error ◽

Coal Consumption ◽

Squared Error ◽

Artificial Neural ◽

Socio Economic Variables ◽

Intelligence Approach

Coal as a fossil and non-renewable fuel is one of the most valuable energy minerals in the world with the largest volume reserves. Artificial neural networks (ANN), despite being one of the highest breakthroughs in the field of computational intelligence, has some significant disadvantages, such as slow training, susceptibility to falling into a local optimal points, sensitivity of initial weights, and bias. To overcome these shortcomings, this study presents an improved ANN structure, that is optimized by a proposed hybrid method. The aim of this study is to propose a novel hybrid method for predicting coal consumption in Iran based on socio-economic variables using the bat and grey wolf optimization algorithm with an artificial neural network (BGWAN). For this purpose, data from 1981 to 2019 have been used for modelling and testing the method. The available data are partly used to find the optimal or near-optimal values of the weighting parameters (1980–2014) and partly to test the model (2015–2019). The performance of the BGWAN is evaluated by mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), standard deviation error (STD), and correlation coefficient (R^2) between the output of the method and the actual dataset. The result of this study showed that BGWAN performance was excellent and proved its efficiency as a useful and reliable tool for monitoring coal consumption or energy demand in Iran.

Download Full-text