Big Data Techniques for Supporting Official Statistics

2017 ◽

pp. 253-273

Author(s):

Antonino Virgillito ◽

Federico Polidoro

Keyword(s):

Big Data ◽

Data Collection ◽

Data Science ◽

Official Statistics ◽

The Core ◽

Web Scraping ◽

Collection Process ◽

Data Collection Process ◽

Data Source ◽

Use Of Internet

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.

Download Full-text

Developing a Method to Valuate the Collection of Big Data

International Journal of Strategic Decision Sciences ◽

10.4018/ijsds.2019010101 ◽

2019 ◽

Vol 10 (1) ◽

pp. 1-9

Author(s):

Colleen Carraher Wolverton ◽

Brandi N. Guidry Hollier ◽

Michael W. Totaro ◽

Lise Anne D. Slatten

Keyword(s):

Big Data ◽

Data Collection ◽

Data Analytics ◽

Costs And Benefits ◽

Systematic Method ◽

Valuable Data ◽

Collection Process ◽

Risky Investment ◽

Data Collection Process

Although organizations recognize the potential of “big data,” implementation of data analytics processes can consume a considerable amount of resources. The authors propose that when organizations are considering this costly and often risky investment, they need a systematic method to evaluate the costs of data collection associated with the implementation of a new data and analytics (D & A) strategy or an expansion of an existing effort. Therefore, in this article, a new dimension of big data is proposed which is incorporated into a theoretically justified and systematic method for quantifying the costs and benefits of the data collection process. By estimating the worth of data, organizations can more efficiently focus on streamlining the collection of the most beneficial data and jettisoning less valuable data collection efforts.

Download Full-text

Towards big data as official statistics: Case study of the use of mobile positioning data to delineate metropolitan areas in Indonesia

Statistical Journal of the IAOS ◽

10.3233/sji-200750 ◽

2020 ◽

Vol 36 (4) ◽

pp. 943-954

Author(s):

Isnaeni Noviyanti ◽

Panca D. Prabawa ◽

Dwi Puspita Sari ◽

Ade Koswara ◽

Titi Kanti Lestari ◽

...

Keyword(s):

Big Data ◽

Data Collection ◽

Business Process ◽

Business Processes ◽

Metropolitan Areas ◽

Data Use ◽

Data Access ◽

Official Statistics ◽

Mobile Positioning ◽

Data Source

Nowadays, the use of so-called big data as a new data source to complement official statistics has become an opportunity for organizations focusing on statistics. The use of big data can lead to a more efficient data collection. However, currently, there has not been any standard business process for big data collection and processing in BPS-Statistics Indonesia. Meanwhile, the adoption of technologies alone cannot determine the success of big data use. It is widely known that big data use can be challenging, since there are issues regarding data access, quality, and methodology, as well as the development of required skillsets. This paper proposes a framework for a business process that is specifically designed to support the use of big data for official statistics at BPS-Statistics Indonesia along with how existing technology will support it. The development of this framework is based on the wider Statistical Business Process Framework and Architecture (SBFA) developed by BPS-Statistics Indonesia to describe and manage its overall statistical business processes. The paper uses the example of the use of Mobile Positioning Data (MPD) as a big data source to delineate Metropolitan Areas in Indonesia as a way to explain the implementation of the framework.

Download Full-text

Developing a Method to Valuate the Collection of Big Data

10.4018/978-1-6684-3662-2.ch009 ◽

2022 ◽

pp. 188-196

Author(s):

Colleen Carraher Wolverton ◽

Brandi N. Guidry Hollier ◽

Michael W. Totaro ◽

Lise Anne D. Slatten

Keyword(s):

Big Data ◽

Data Collection ◽

Data Analytics ◽

Costs And Benefits ◽

Systematic Method ◽

Valuable Data ◽

Collection Process ◽

Risky Investment ◽

Data Collection Process

Although organizations recognize the potential of “big data,” implementation of data analytics processes can consume a considerable amount of resources. The authors propose that when organizations are considering this costly and often risky investment, they need a systematic method to evaluate the costs of data collection associated with the implementation of a new data and analytics (D & A) strategy or an expansion of an existing effort. Therefore, in this article, a new dimension of big data is proposed which is incorporated into a theoretically justified and systematic method for quantifying the costs and benefits of the data collection process. By estimating the worth of data, organizations can more efficiently focus on streamlining the collection of the most beneficial data and jettisoning less valuable data collection efforts.

Download Full-text

Code Mixing and Code Switching in the “Yowis Ben” Movie: Sociolinguistic Study

International Journal of Linguistics Literature & Translation ◽

10.32996/ijllt.2021.4.4.3 ◽

2021 ◽

Vol 4 (4) ◽

pp. 14-19

Author(s):

Oriana Surya Ningsih ◽

Teguh Setiawan

Keyword(s):

Data Collection ◽

Native Language ◽

Qualitative Approach ◽

Code Switching ◽

Direct Role ◽

Code Mixing ◽

Collection Process ◽

Data Collection Process ◽

Data Source ◽

Immediate Response

The use of native language in the Yowis ben movie by Fajar Nugros and Bayu Eko Moektito turns it into various language forms. Therefore, the research aims to identify the types of code-mixing and code-switching in the movie. This research uses a qualitative approach. The researcher also played a direct role in the data collection process by determining the data source, listening to and recording data. Based on the discussion of code switching and code mixing in the “Yowis Ben” movie, the conclusions drawn are as follows. First, the code-mixing used in the “Yowis Ben” movie occurred using Indonesian and Javanese language. There are three functions of code-mixing: respecting the addressee, providing information, and clarifying the speech. Second, the code-switching in the “Yowis Ben” movie occurred using the Javanese language. This is because the background of the screenplay depicted actors who are from Malang, East Java. Also, there are three functions of code-mixing: neutralizing the use of language, establishing humour, and getting an immediate response to the speech.

Download Full-text

Pemanfaatan Sistem Informasi Sebagai Alat Untuk Pendataan Masyarakat Di Pasuruan

BAKTIMAS : Jurnal Pengabdian pada Masyarakat ◽

10.32672/btm.v2i3.2390 ◽

2020 ◽

Vol 2 (3) ◽

pp. 145-150

Author(s):

Syaifuddin Syaifuddin ◽

Wildan Suharso

Keyword(s):

Information System ◽

Data Collection ◽

Human Resources ◽

Regional Government ◽

Community Based ◽

Collection Process ◽

Data Collection Process ◽

Service Activity ◽

Development Plans ◽

Collection Time

Pendataan yang bersifat manual menjadikan permasalahan pada proses dilakukannya pendataan, hal ini juga terjadi pada Dinas Pendidikan dan Kebudayaan Kota Pasuruan dimana pendataan masih bersifat manual dengan keterbatasan jumlah Sumber Daya Manusia (SDM) yang ditugaskan untuk melakukan pendataan, oleh karena itu pada kegiatan pengabdian ini dilakukan pelatihan sistem informasi untuk meningkatkan waktu pendataan dan mengurangi kompleksitas dalam proses pendataan pada pegawai di Dinas Pendidikan dan Kebudayaan Kota Pasuruan. Pelatihan sistem informasi yang dimaksud adalah sistem informasi pendataan berbasis masyarakat, yang berisikan data dasar yang diperlukan oleh Pemerintah Daerah dalam rangka penyusunan rencana pembangunan. Data informasi tidak akan memberikan manfaat jika tidak dijadikan sebagai bahan acuan dalam penyusunan rencana pembangunan sehingga pelatihan dan pendampingan perlu dilakukan untuk tercapainya tujuan.Kata Kunci : Sistem Informasi, Berbasis Masyarakat, PendataanABSTRACT Manual data collection causes problems in the data collection process, this also occurs in the Pasuruan City Education and Culture Office where data collection is still manual with a limited number of Human Resources (HR) assigned to collect data, therefore this service activity is carried out information system training to increase data collection time and reduce complexity in the data collection process for employees at the Pasuruan City Education and Culture Office. The information system training referred to is a community-based data collection information system, which contains basic data required by the Regional Government in the framework of formulating development plans. Information data will not provide benefits if it is not used as a reference in the preparation of development plans so that training and assistance are needed to achieve the goals.Keywords : Information System, Community Based, Data Collection

Download Full-text

Mapping the United Nations Fundamental Principles of Official Statistics against new and big data sources

Statistical Journal of the IAOS ◽

10.3233/sji-210789 ◽

2021 ◽

Vol 37 (1) ◽

pp. 161-169

Author(s):

Dominik Rozkrut ◽

Olga Świerkot-Strużewska ◽

Gemma Van Halderen

Keyword(s):

Big Data ◽

Public Information ◽

Fundamental Principle ◽

Data Sources ◽

Official Statistics ◽

Development Agenda ◽

Data Gaps ◽

Data Source ◽

Exciting Time ◽

Statistical Systems

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.

Download Full-text

The MID5 Dataset, 2011–2014: Procedures, coding rules, and description

Conflict Management and Peace Science ◽

10.1177/0738894221995743 ◽

2021 ◽

pp. 073889422199574

Author(s):

Glenn Palmer ◽

Roseanne W McManus ◽

Vito D’Orazio ◽

Michael R Kenwick ◽

Mikaela Karstens ◽

...

Keyword(s):

Data Collection ◽

Descriptive Statistics ◽

Collection Process ◽

Data Collection Process ◽

Interstate Conflicts ◽

Coding Manual ◽

Militarized Interstate Dispute ◽

Coding Rules

This article introduces the latest iteration of the most widely used dataset on interstate conflicts, the Militarized Interstate Dispute (MID) 5 dataset. We begin by outlining the data collection process used in the MID5 project. Next, we discuss some of the most challenging cases that we coded and some updates to the coding manual that resulted. Finally, we provide descriptive statistics for the new years of the MID data.

Download Full-text

SIKEPUL: Sistem Informasi Untuk Administrasi Transaksi Jual Beli Pengepul Rongsokan Menggunakan Metode Waterfall

Journal of Innovation Information Technology and Application (JINITA) ◽

10.35970/jinita.v3i2.670 ◽

2021 ◽

Vol 3 (2) ◽

pp. 103-114

Author(s):

Ika Dewi Rozaurrohmah ◽

Lutfi Syafirullah ◽

Oman Somantri

Keyword(s):

Information System ◽

Data Collection ◽

Administrative Data ◽

Existing Problems ◽

Collection Process ◽

Data Collection Process ◽

Communication Errors ◽

Transaction Activities

Currently collector businessmen are experiencing problems, namely the absence of data collection for suppliers and collapsed transaction activities. In addition, the administrative data collection process is still carried out manually by the admin, , one of which is using notes when making junk transactions and when partners make payments to collectors, there are often communication errors in junk transactions between suppliers and partners often occur. In order to overcome the existing problems, this research proposes the development of a collector administration information system named SIKEPUL using the laravel framework. The method in developing the system used is the waterfall method. The results showed that the SIKEPUL information system could solve the problems faced. The overall results of the questionnaire for 30 respondents were that 20% said it was very good, 52% said it was good, and 28% said it was enough for this system.

Download Full-text

An Assessment And Data Collection Process For Evaluating Student Progress On "A K" Abet Educational Outcomes

10.18260/1-2--15937 ◽

2020 ◽

Author(s):

Kathleen Ossman

Keyword(s):

Data Collection ◽

Educational Outcomes ◽

Student Progress ◽

Collection Process ◽

Data Collection Process

Download Full-text