scholarly journals The Challenge of using Map-reduce to Query Open Data

Author(s):  
Mauro Pelucchi ◽  
Giuseppe Psaila ◽  
Maurizio Toccu
Keyword(s):  
Algorithms ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 209 ◽  
Author(s):  
Mauro Pelucchi ◽  
Giuseppe Psaila ◽  
Maurizio Toccu

The Hammer prototype is a query engine for corpora of Open Data that provides users with the concept of blind querying. Since data sets published on Open Data portals are heterogeneous, users wishing to find out interesting data sets are blind: queries cannot be fully specified, as in the case of databases. Consequently, the query engine is responsible for rewriting and adapting the blind query to the actual data sets, by exploiting lexical and semantic similarity. The effectiveness of this approach was discussed in our previous works. In this paper, we report our experience in developing the query engine. In fact, in the very first version of the prototype, we realized that the implementation of the retrieval technique was too slow, even though corpora contained only a few thousands of data sets. We decided to adopt the Map-Reduce paradigm, in order to parallelize the query engine and improve performances. We passed through several versions of the query engine, either based on the Hadoop framework or on the Spark framework. Hadoop and Spark are two very popular frameworks for writing and executing parallel algorithms based on the Map-Reduce paradigm. In this paper, we present our study about the impact of adopting the Map-Reduce approach and its two most famous frameworks to parallelize the Hammer query engine; we discuss various implementations of the query engine, either obtained without significantly rewriting the algorithm or obtained by completely rewriting the algorithm by exploiting high level abstractions provided by Spark. The experimental campaign we performed shows the benefits provided by each studied solution, with the perspective of moving toward Big Data in the future. The lessons we learned are collected and synthesized into behavioral guidelines for developers approaching the problem of parallelizing algorithms by means of Map-Reduce frameworks.


2019 ◽  
Vol 15 (1) ◽  
Author(s):  
Dodi Faedlulloh ◽  
Fetty Wiyani

This paper aimed to explain public financial governance based on good governance implementation in Jakarta Provincial Government. This paper specifically discussed towards transparancy implementation of local budget (APBD) through open data portal that publishes budget data to public. In general, financial transparency through open data has met Transparency 2.0 standards, namely the existence of encompassing, one-stop, one-click budget accountability and accessibility. But there are indeed some shortcomings that are still a concern in order to continue to maintain commitment to the principle of transparency, namely by updating data through consistent data visualization.Transparency of public finance needs to continue to be developed and improved through various innovations to maintain public trust in the government.Keywords: Public Finance, Open Data, Transparency


2020 ◽  
Vol 16 (10) ◽  
pp. 1627
Author(s):  
Pei Shujun ◽  
Zhang Yu ◽  
Liang Chao

2018 ◽  
Author(s):  
Dick Bierman ◽  
Jacob Jolij

We have tested the feasibility of a method to prevent the occurrence of so-called Questionable Research Practices (QRP). A part from embedded pre-registration the major aspect of the system is real-time uploading of data on a secure server. We outline the method, discuss the drop-out treatment and compare it to the Born-open data method, and report on our preliminary experiences. We also discuss the extension of the data-integrity system from secure server to use of blockchain technology.


Sign in / Sign up

Export Citation Format

Share Document