Big Data Techniques for Supporting Official Statistics

Web Services ◽  
2019 ◽  
pp. 728-744 ◽  
Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.

Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.


2019 ◽  
Vol 10 (1) ◽  
pp. 1-9
Author(s):  
Colleen Carraher Wolverton ◽  
Brandi N. Guidry Hollier ◽  
Michael W. Totaro ◽  
Lise Anne D. Slatten

Although organizations recognize the potential of “big data,” implementation of data analytics processes can consume a considerable amount of resources. The authors propose that when organizations are considering this costly and often risky investment, they need a systematic method to evaluate the costs of data collection associated with the implementation of a new data and analytics (D & A) strategy or an expansion of an existing effort. Therefore, in this article, a new dimension of big data is proposed which is incorporated into a theoretically justified and systematic method for quantifying the costs and benefits of the data collection process. By estimating the worth of data, organizations can more efficiently focus on streamlining the collection of the most beneficial data and jettisoning less valuable data collection efforts.


2020 ◽  
Vol 36 (4) ◽  
pp. 943-954
Author(s):  
Isnaeni Noviyanti ◽  
Panca D. Prabawa ◽  
Dwi Puspita Sari ◽  
Ade Koswara ◽  
Titi Kanti Lestari ◽  
...  

Nowadays, the use of so-called big data as a new data source to complement official statistics has become an opportunity for organizations focusing on statistics. The use of big data can lead to a more efficient data collection. However, currently, there has not been any standard business process for big data collection and processing in BPS-Statistics Indonesia. Meanwhile, the adoption of technologies alone cannot determine the success of big data use. It is widely known that big data use can be challenging, since there are issues regarding data access, quality, and methodology, as well as the development of required skillsets. This paper proposes a framework for a business process that is specifically designed to support the use of big data for official statistics at BPS-Statistics Indonesia along with how existing technology will support it. The development of this framework is based on the wider Statistical Business Process Framework and Architecture (SBFA) developed by BPS-Statistics Indonesia to describe and manage its overall statistical business processes. The paper uses the example of the use of Mobile Positioning Data (MPD) as a big data source to delineate Metropolitan Areas in Indonesia as a way to explain the implementation of the framework.


2022 ◽  
pp. 188-196
Author(s):  
Colleen Carraher Wolverton ◽  
Brandi N. Guidry Hollier ◽  
Michael W. Totaro ◽  
Lise Anne D. Slatten

Although organizations recognize the potential of “big data,” implementation of data analytics processes can consume a considerable amount of resources. The authors propose that when organizations are considering this costly and often risky investment, they need a systematic method to evaluate the costs of data collection associated with the implementation of a new data and analytics (D & A) strategy or an expansion of an existing effort. Therefore, in this article, a new dimension of big data is proposed which is incorporated into a theoretically justified and systematic method for quantifying the costs and benefits of the data collection process. By estimating the worth of data, organizations can more efficiently focus on streamlining the collection of the most beneficial data and jettisoning less valuable data collection efforts.


Author(s):  
Oriana Surya Ningsih ◽  
Teguh Setiawan

The use of native language in the Yowis ben movie by Fajar Nugros and Bayu Eko Moektito turns it into various language forms. Therefore, the research aims to identify the types of code-mixing and code-switching in the movie. This research uses a qualitative approach. The researcher also played a direct role in the data collection process by determining the data source, listening to and recording data. Based on the discussion of code switching and code mixing in the “Yowis Ben” movie, the conclusions drawn are as follows. First, the code-mixing used in the “Yowis Ben” movie occurred using Indonesian and Javanese language. There are three functions of code-mixing: respecting the addressee, providing information, and clarifying the speech. Second, the code-switching in the “Yowis Ben” movie occurred using the Javanese language. This is because the background of the screenplay depicted actors who are from Malang, East Java. Also, there are three functions of code-mixing: neutralizing the use of language, establishing humour, and getting an immediate response to the speech.


2020 ◽  
Vol 2 (3) ◽  
pp. 145-150
Author(s):  
Syaifuddin Syaifuddin ◽  
Wildan Suharso

Pendataan yang bersifat manual menjadikan permasalahan pada proses dilakukannya pendataan, hal ini juga terjadi pada Dinas Pendidikan dan Kebudayaan Kota Pasuruan dimana pendataan masih bersifat manual dengan keterbatasan jumlah Sumber Daya Manusia (SDM) yang ditugaskan untuk melakukan pendataan, oleh karena itu pada kegiatan pengabdian ini dilakukan pelatihan sistem informasi untuk meningkatkan waktu pendataan dan mengurangi kompleksitas dalam proses pendataan pada pegawai di Dinas Pendidikan dan Kebudayaan Kota Pasuruan. Pelatihan sistem informasi yang dimaksud adalah sistem informasi pendataan berbasis masyarakat, yang berisikan data dasar yang diperlukan oleh Pemerintah Daerah dalam rangka penyusunan rencana pembangunan. Data informasi tidak akan memberikan manfaat jika tidak dijadikan sebagai bahan acuan dalam penyusunan rencana pembangunan sehingga pelatihan dan pendampingan perlu dilakukan untuk tercapainya tujuan.Kata Kunci : Sistem Informasi, Berbasis Masyarakat, PendataanABSTRACT Manual data collection causes problems in the data collection process, this also occurs in the Pasuruan City Education and Culture Office where data collection is still manual with a limited number of Human Resources (HR) assigned to collect data, therefore this service activity is carried out information system training to increase data collection time and reduce complexity in the data collection process for employees at the Pasuruan City Education and Culture Office. The information system training referred to is a community-based data collection information system, which contains basic data required by the Regional Government in the framework of formulating development plans. Information data will not provide benefits if it is not used as a reference in the preparation of development plans so that training and assistance are needed to achieve the goals.Keywords : Information System, Community Based, Data Collection 


2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


2021 ◽  
pp. 073889422199574
Author(s):  
Glenn Palmer ◽  
Roseanne W McManus ◽  
Vito D’Orazio ◽  
Michael R Kenwick ◽  
Mikaela Karstens ◽  
...  

This article introduces the latest iteration of the most widely used dataset on interstate conflicts, the Militarized Interstate Dispute (MID) 5 dataset. We begin by outlining the data collection process used in the MID5 project. Next, we discuss some of the most challenging cases that we coded and some updates to the coding manual that resulted. Finally, we provide descriptive statistics for the new years of the MID data.


Author(s):  
Ika Dewi Rozaurrohmah ◽  
Lutfi Syafirullah ◽  
Oman Somantri

Currently collector businessmen are experiencing problems, namely the absence of data collection for suppliers and collapsed transaction activities. In addition, the administrative data collection process is still carried out manually by the admin, , one of which is using notes when making junk transactions and when partners make payments to collectors, there are often communication errors in junk transactions between suppliers and partners often occur. In order to overcome the existing problems, this research proposes the development of a collector administration information system named SIKEPUL using the laravel framework. The method in developing the system used is the waterfall method. The results showed that the SIKEPUL information system could solve the problems faced. The overall results of the questionnaire for 30 respondents were that 20% said it was very good, 52% said it was good, and 28% said it was enough for this system. 


Sign in / Sign up

Export Citation Format

Share Document