Big Data and Official Statistics
Big data is a component of the Fourth Industrial Revolution. The deep penetration of digital technology has turned data into an essential component of the production process. Data are automatically generated by machines during the course of operation and during interactions with humans. This paper describes the concept and composition of big data. Most of the big data are unstructured and include text, audio-video files, images, emails, log files, etc. Statisticians are more interested in structured data presented in a pre-defined database model. Big data offer new sources and opportunities that cannot be discounted. However, the use of big data requires proper assessment in terms of quality dimensions such as accuracy, comparability and methodological soundness. Against the backdrop of arguments regarding big data, some users view big data as a replacement of official statistics. Such a conclusion is premature for at least two reasons: first, only a small part of big data can be used for decision-making. Second, theory and practice prove that a small sample based on scientific methods can yield much more reliable and accurate estimates than the results obtained from the processing of large amounts of unstructured data. The paper assesses the possibility of using big data for Sustainable Development Goals (SDG) monitoring, which is a nationally owned process, and NSOs are accountable for the SDG data they report. If the data are derived from a big data source, irrespective of the level of technical sophistication used in data transformation, the reliability of such data might be questioned by the national institutions. The paper concludes that the reliability of data obtained from big data sources hinges on the quality of tools and methods applied to data transformation. Statisticians can play an important role in alerting society, decision-making bodies of the government and businesses about the reliability of information derived from the different sources.