scholarly journals Towards Automatic Data Format Transformations: Data Wrangling at Scale

2018 ◽  
Vol 62 (7) ◽  
pp. 1044-1060 ◽  
Author(s):  
Alex Bogatu ◽  
Norman W Paton ◽  
Alvaro A A Fernandes ◽  
Martin Koehler

Abstract Data wrangling is the process whereby data are cleaned and integrated for analysis. Data wrangling, even with tool support, is typically a labour intensive process. One aspect of data wrangling involves carrying out format transformations on attribute values, for example so that names or phone numbers are represented consistently. Recent research has developed techniques for synthesizing format transformation programs from examples of the source and target representations. This is valuable, but still requires a user to provide suitable examples, something that may be challenging in applications in which there are huge datasets or numerous data sources. In this paper, we investigate the automatic discovery of examples that can be used to synthesize format transformation programs. In particular, we propose two approaches to identifying candidate data examples and validating the transformations that are synthesized from them. The approaches are evaluated empirically using datasets from open government data.

2015 ◽  
Vol 9 (3) ◽  
pp. 721-753 ◽  
Author(s):  
Charalampos Alexopoulos ◽  
Euripidis Loukis ◽  
Spiros Mouzakitis ◽  
Michalis Petychakis ◽  
Yannis Charalabidis

2021 ◽  
Vol 11 (19) ◽  
pp. 9270
Author(s):  
Sovit Bhandari ◽  
Navin Ranjan ◽  
Yeong-Chan Kim ◽  
Jong-Do Park ◽  
Kwang-Il Hwang ◽  
...  

In recent years, the governments in many countries have recognized the importance of data in boosting their economies. As a result, they are implementing the philosophy of open government data (OGD) to make public data easily and freely available to everyone in standardized formats. Because good quality OGD can boost a country’s economy, whereas poor quality can jeopardize its efficient use and reuse, it is very important to maintain the quality of data stored in open government data portals (OGDP). However, most OGDPs do not have a feature that indicates the quality of the data stored there, and even if they do, they do not provide real-time service. Moreover, most recent studies focused on developing approaches to quantify the quality of OGD, either qualitatively or quantitatively, but did not offer an approach to automatically calculate and visualize it in real-time. To address this problem to some extent, this paper proposes a framework that can automatically assess the quality of data in the form of a data completeness ratio (DCR) and visualize it in real-time. The framework is validated using the OGD of South Korea, whose DCR is displayed in real-time using the Django-based dashboard.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5204
Author(s):  
Anastasija Nikiforova

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.


Author(s):  
HuiYan Ho ◽  
Sheuwen Chuang ◽  
Niann-Tzyy Dai ◽  
Chia-Hsin Cheng ◽  
Wei-Fong Kao

Author(s):  
Evangelos Kalampokis ◽  
Efthimios Tambouris ◽  
Konstantinos Tarabanis

Sign in / Sign up

Export Citation Format

Share Document