scholarly journals Measuring and Communicating the Uncertainty in Official Economic Statistics

2021 ◽  
Vol 37 (2) ◽  
pp. 289-316
Author(s):  
Gian Luigi Mazzi ◽  
James Mitchell ◽  
Florabela Carausu

Abstract Official economic statistics are uncertain even if not always interpreted or treated as such. From a historical perspective, this article reviews different categorisations of data uncertainty, specifically the traditional typology that distinguishes sampling from nonsampling errors and a newer typology of Manski (2015). Throughout, the importance of measuring and communicating these uncertainties is emphasised, as hard as it can prove to measure some sources of data uncertainty, especially those relevant to administrative and big data sets. Accordingly, this article both seeks to encourage further work into the measurement and communication of data uncertainty in general and to introduce the Comunikos (COMmunicating UNcertainty In Key Official Statistics) project at Eurostat. Comunikos is designed to evaluate alternative ways of measuring and communicating data uncertainty specifically in contexts relevant to official economic statistics.

2019 ◽  
pp. 119-132
Author(s):  
David Rhind

This chapter describes the evolution of UK Official Statistics over an 80 year period under the influence of personalities, politics and government policies, new user needs and changing technology. These have led to changing institutional structures – such as the Statistics Commission - and periodic oscillations in what statistics are created and the ease of their accessibility by the public. The chapter concludes with the impact of the first major statistical legislation for 60 years, particularly as a consequence of its creation of the UK Statistics Authority. This has included major investment in quality assurance of National and Official Statistics and in professional resourcing. These changes are very welcome, as is the statutory specification of government statistics as a public good by the 2007 Statistics and Registration Service Act. But problems of access to some data sets and the pre-release of key economic statistics to selected groups of users remain. Given the widespread societal consequences of the advent of new technologies, what we collect and how we do it will inevitably continue to change rapidly.


Author(s):  
Kees Zeelenberg ◽  
Barteld Braaksma

Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest. Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer, may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. In this chapter, we describe and discuss opportunities for big data in official statistics.


2021 ◽  
Vol 37 (1) ◽  
pp. 121-147
Author(s):  
Rob Kitchin ◽  
Samuel Stehle

Abstract In this article we evaluate the viability of using big data produced by smart city systems for creating new official statistics. We assess sixteen sources of urban transportation and environmental big data that are published as open data or were made available to the project for Dublin, Ireland. These data were systematically explored through a process of data checking and wrangling, building tools to display and analyse the data, and evaluating them with respect to 16 measures of their suitability: access, sustainability and reliability, transparency and interpretability, privacy, fidelity, cleanliness, completeness, spatial granularity, temporal granularity, spatial coverage, coherence, metadata availability, changes over time, standardisation, methodological transparency, and relevance. We assessed how the data could be used to produce key performance indicators and potential new official statistics. Our analysis reveals that, at present, a limited set of smart city data is suitable for creating new official statistics, though others could potentially be made suitable with changes to data management. If these new official statistics are to be realised then National Statistical Institutions need to work closely with those organisations generating the data to try and implement a robust set of procedures and standards that will produce consistent, long-term data sets.


2014 ◽  
Author(s):  
Pankaj K. Agarwal ◽  
Thomas Moelhave
Keyword(s):  
Big Data ◽  

2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


2021 ◽  
pp. 1-30
Author(s):  
Lisa Grace S. Bersales ◽  
Josefina V. Almeda ◽  
Sabrina O. Romasoc ◽  
Marie Nadeen R. Martinez ◽  
Dannela Jann B. Galias

With the advancement of technology, digitalization, and the internet of things, large amounts of complex data are being produced daily. This vast quantity of various data produced at high speed is referred to as Big Data. The utilization of Big Data is being implemented with success in the private sector, yet the public sector seems to be falling behind despite the many potentials Big Data has already presented. In this regard, this paper explores ways in which the government can recognize the use of Big Data for official statistics. It begins by gathering and presenting Big Data-related initiatives and projects across the globe for various types and sources of Big Data implemented. Further, this paper discusses the opportunities, challenges, and risks associated with using Big Data, particularly in official statistics. This paper also aims to assess the current utilization of Big Data in the country through focus group discussions and key informant interviews. Based on desk review, discussions, and interviews, the paper then concludes with a proposed framework that provides ways in which Big Data may be utilized by the government to augment official statistics.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


Sign in / Sign up

Export Citation Format

Share Document