scholarly journals SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink

2021 ◽  
Vol 11 (15) ◽  
pp. 7033
Author(s):  
Oscar Ceballos ◽  
Carlos Alberto Ramírez Restrepo ◽  
María Constanza Pabón ◽  
Andres M. Castillo ◽  
Oscar Corcho

Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the project is available in Github under the MIT license.

Author(s):  
Javier Conejero ◽  
Sandra Corella ◽  
Rosa M Badia ◽  
Jesus Labarta

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.


2021 ◽  
Vol 2 (1) ◽  
pp. 11-29
Author(s):  
Jasmina Pivar

Abstract The cities of the European Union are adopting big data technologies in their development towards a smart city. Given that big data technologies are complex and disruptive technologies, it is necessary to determine the importance of factors and their aspects for the adoption of big data technologies in cities. The aim of this paper is to identify the most important aspects of technological factors in the adoption of big data technologies in the cities of the European Union. In order to achieve the goal of the paper a survey was conducted on a sample of European Union cities, and on the collected data, an analysis of the map of importance and performance of factors for the adoption of big data technologies was conducted. The results of the research show that the aspects of absorption capacity and technological readiness of EU cities are of relatively high importance, but with low levels of performance in relation to organizational and environmental factors. The contribution of the paper consists of general guidelines for increasing the level of technological readiness and absorption capacity of cities in order to increase the success of the adoption of big data technologies in the cities of the European Union.


2015 ◽  
Vol 11 (8) ◽  
pp. 271752 ◽  
Author(s):  
Anton Kos ◽  
Sašo Tomažič ◽  
Jakob Salom ◽  
Nemanja Trifunovic ◽  
Mateo Valero ◽  
...  

Author(s):  
A. G. Stepanov ◽  
G. A. Plotnikov ◽  
V. S. Vasilyeva

The article actualizes the need for teaching students to work with Big Data technologies. Big Data is a promising and fundamental industry that requires a large number of qualified specialists in various fields. The aim of the work is to describe the concept of determining a set of hardware, software, algorithmic and methodological tools (taking into account the contingent of students and the capabilities of the educational institution) for building a methodology for teaching a discipline related to the study of Big Data processing methods. There are two main sectors of stakeholders who need specialists in the field of Big Data. A detailed comparative analysis of software solutions that support Big Data processing is carried out. The article describes the methodology for constructing a course for teaching students technologies for processing and analyzing Big Data. A plan for organizing a lecture course and laboratory practice with consideration of subtasks is proposed for students to perform during training. The composition and methodology of independent work of students in the discipline related to the study of Big Data, using a learning management system such as Moodle, are discussed. An example of implementing data processing by means of the RapidMiner Studio package using a multi-layer neural network training algorithm using the error back propagation method is presented.


Big Data ◽  
2016 ◽  
pp. 1110-1128
Author(s):  
Ruben C. Huacarpuma ◽  
Daniel da C. Rodrigues ◽  
Antonio M. Rubio Serrano ◽  
João Paulo C. Lustosa da Costa ◽  
Rafael T. de Sousa Júnior ◽  
...  

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.


Author(s):  
Ruben C. Huacarpuma ◽  
Daniel da C. Rodrigues ◽  
Antonio M. Rubio Serrano ◽  
João Paulo C. Lustosa da Costa ◽  
Rafael T. de Sousa Júnior ◽  
...  

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.


2017 ◽  
Vol 113 ◽  
pp. 429-434 ◽  
Author(s):  
Y. Nait Malek ◽  
A. Kharbouch ◽  
H. El Khoukhi ◽  
M. Bakhouya ◽  
V. De Florio ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document