The Challenge of using Map-reduce to Query Open Data

Hadoop vs. Spark: Impact on Performance of the Hammer Query Engine for Open Data Corpora

Algorithms ◽

10.3390/a11120209 ◽

2018 ◽

Vol 11 (12) ◽

pp. 209 ◽

Cited By ~ 1

Author(s):

Mauro Pelucchi ◽

Giuseppe Psaila ◽

Maurizio Toccu

Keyword(s):

Open Data ◽

Map Reduce ◽

Data Sets ◽

Retrieval Technique ◽

Experimental Campaign ◽

Query Engine ◽

High Level ◽

The Impact ◽

Spark Framework ◽

Hadoop Framework

The Hammer prototype is a query engine for corpora of Open Data that provides users with the concept of blind querying. Since data sets published on Open Data portals are heterogeneous, users wishing to find out interesting data sets are blind: queries cannot be fully specified, as in the case of databases. Consequently, the query engine is responsible for rewriting and adapting the blind query to the actual data sets, by exploiting lexical and semantic similarity. The effectiveness of this approach was discussed in our previous works. In this paper, we report our experience in developing the query engine. In fact, in the very first version of the prototype, we realized that the implementation of the retrieval technique was too slow, even though corpora contained only a few thousands of data sets. We decided to adopt the Map-Reduce paradigm, in order to parallelize the query engine and improve performances. We passed through several versions of the query engine, either based on the Hadoop framework or on the Spark framework. Hadoop and Spark are two very popular frameworks for writing and executing parallel algorithms based on the Map-Reduce paradigm. In this paper, we present our study about the impact of adopting the Map-Reduce approach and its two most famous frameworks to parallelize the Hammer query engine; we discuss various implementations of the query engine, either obtained without significantly rewriting the algorithm or obtained by completely rewriting the algorithm by exploiting high level abstractions provided by Spark. The experimental campaign we performed shows the benefits provided by each studied solution, with the perspective of moving toward Big Data in the future. The lessons we learned are collected and synthesized into behavioral guidelines for developers approaching the problem of parallelizing algorithms by means of Map-Reduce frameworks.

Download Full-text

Public data and value creation in Italy.The findings from the Open Data 200 study

SOCIOLOGIA DEL LAVORO ◽

10.3280/sl2018-152004 ◽

2018 ◽

pp. 65-83

Author(s):

Francesca De Chiara

Keyword(s):

Value Creation ◽

Open Data ◽

Public Data

Download Full-text

Il rapporto tra pubblica amministrazione e cittadini nella città digitale: trasparenza, accountability e open data nei nuovi contesti urbani

SOCIOLOGIA URBANA E RURALE ◽

10.3280/sur2015-107010 ◽

2015 ◽

pp. 135-149 ◽

Cited By ~ 1

Author(s):

Gea Ducci

Keyword(s):

Open Data

Download Full-text

Promote Good Governance in Public Financial: The Practice of Local Budget (APBD) Transparency Through Open Data Jakarta in Jakarta Provincial Government

Jurnal Good Governance ◽

10.32834/gg.v15i1.41 ◽

2019 ◽

Vol 15 (1) ◽

Author(s):

Dodi Faedlulloh ◽

Fetty Wiyani

Keyword(s):

Public Finance ◽

Good Governance ◽

Open Data ◽

Public Trust ◽

Provincial Government ◽

Financial Governance ◽

One Stop ◽

Data Portal ◽

Data Transparency ◽

Local Budget

This paper aimed to explain public financial governance based on good governance implementation in Jakarta Provincial Government. This paper specifically discussed towards transparancy implementation of local budget (APBD) through open data portal that publishes budget data to public. In general, financial transparency through open data has met Transparency 2.0 standards, namely the existence of encompassing, one-stop, one-click budget accountability and accessibility. But there are indeed some shortcomings that are still a concern in order to continue to maintain commitment to the principle of transparency, namely by updating data through consistent data visualization.Transparency of public finance needs to continue to be developed and improved through various innovations to maintain public trust in the government.Keywords: Public Finance, Open Data, Transparency

Download Full-text

TOWARD A LINKED OPEN DATA REPOSITORY ABOUT VIETNAMESE TOURISM

KỶ YẾU HỘI NGHỊ KHOA HỌC CÔNG NGHỆ QUỐC GIA LẦN THỨ XI NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN ◽

10.15625/vap.2018.00067 ◽

2018 ◽

Author(s):

Le Anh Tien ◽

Cao Tuan Dung

Keyword(s):

Open Data ◽

Linked Open Data ◽

Data Repository

Download Full-text

Efﬁcient Processing of Job by Enhancing Hadoop Map Reduce Framework Using Containers

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13604 ◽

2017 ◽

Vol 7 (6) ◽

pp. 521

Author(s):

Shweta V. H. Chaudhari ◽

R. M. Wahul

Keyword(s):

Map Reduce ◽

Efficient Processing

Download Full-text

Fast Pruning Algorithm and Task Scheduling under Map/Reduce

International Journal of Performability Engineering ◽

10.23940/ijpe.20.10.p14.16271636 ◽

2020 ◽

Vol 16 (10) ◽

pp. 1627

Author(s):

Pei Shujun ◽

Zhang Yu ◽

Liang Chao

Keyword(s):

Task Scheduling ◽

Map Reduce ◽

Pruning Algorithm ◽

And Task

Download Full-text

Towards guaranteed data-integrity: A method of preventing Questionable Research Practices

10.31234/osf.io/rmfzh ◽

2018 ◽

Author(s):

Dick Bierman ◽

Jacob Jolij

Keyword(s):

Real Time ◽

Open Data ◽

Data Integrity ◽

Drop Out ◽

Research Practices ◽

Major Aspect ◽

Questionable Research Practices ◽

Blockchain Technology ◽

Integrity System ◽

Secure Server

We have tested the feasibility of a method to prevent the occurrence of so-called Questionable Research Practices (QRP). A part from embedded pre-registration the major aspect of the system is real-time uploading of data on a secure server. We outline the method, discuss the drop-out treatment and compare it to the Born-open data method, and report on our preliminary experiences. We also discuss the extension of the data-integrity system from secure server to use of blockchain technology.

Download Full-text