SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.

Download Full-text

Adoption of big data technologies in smart cities of the European Union: Analysis of the importance and performance of technological factors

Croatian Regional Development Journal ◽

10.2478/crdj-2021-0005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 11-29

Author(s):

Jasmina Pivar

Keyword(s):

European Union ◽

Big Data ◽

Smart Cities ◽

Absorption Capacity ◽

The European Union ◽

Technological Factors ◽

Technological Readiness ◽

Disruptive Technologies ◽

Big Data Technologies ◽

And Performance

Abstract The cities of the European Union are adopting big data technologies in their development towards a smart city. Given that big data technologies are complex and disruptive technologies, it is necessary to determine the importance of factors and their aspects for the adoption of big data technologies in cities. The aim of this paper is to identify the most important aspects of technological factors in the adoption of big data technologies in the cities of the European Union. In order to achieve the goal of the paper a survey was conducted on a sample of European Union cities, and on the collected data, an analysis of the map of importance and performance of factors for the adoption of big data technologies was conducted. The results of the research show that the aspects of absorption capacity and technological readiness of EU cities are of relatively high importance, but with low levels of performance in relation to organizational and environmental factors. The contribution of the paper consists of general guidelines for increasing the level of technological readiness and absorption capacity of cities in order to increase the success of the adoption of big data technologies in the cities of the European Union.

Download Full-text

Industrial track: Architecting railway KPIs data processing with Big Data technologies

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006196 ◽

2019 ◽

Author(s):

Alexander Suleykin ◽

Peter Panfilov ◽

Natalya Bakhtadze

Keyword(s):

Big Data ◽

Data Processing ◽

Big Data Technologies

Download Full-text

Performance Optimization of Big Data Processing using Clustering Technique in Map Reduces Programming Model

International Journal of Computer Applications ◽

10.5120/ijca2016911748 ◽

2016 ◽

Vol 151 (4) ◽

pp. 42-46 ◽

Cited By ~ 1

Author(s):

Ravindra Singh ◽

Deepak Sain

Keyword(s):

Big Data ◽

Data Processing ◽

Performance Optimization ◽

Programming Model ◽

Big Data Processing ◽

Clustering Technique

Download Full-text

New Benchmarking Methodology and Programming Model for Big Data Processing

International Journal of Distributed Sensor Networks ◽

10.1155/2015/271752 ◽

2015 ◽

Vol 11 (8) ◽

pp. 271752 ◽

Cited By ~ 20

Author(s):

Anton Kos ◽

Sašo Tomažič ◽

Jakob Salom ◽

Nemanja Trifunovic ◽

Mateo Valero ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Programming Model ◽

Big Data Processing

Download Full-text

Approaches to the choice of tools for constructing a methodology for learning to work with Big Data

Informatics and Education ◽

10.32517/0234-0453-2021-36-4-54-62 ◽

2021 ◽

pp. 54-62

Author(s):

A. G. Stepanov ◽

G. A. Plotnikov ◽

V. S. Vasilyeva

Keyword(s):

Big Data ◽

Data Processing ◽

Educational Institution ◽

Back Propagation ◽

Big Data Processing ◽

Error Back Propagation ◽

Network Training ◽

Big Data Technologies ◽

Independent Work ◽

Data Processing Methods

The article actualizes the need for teaching students to work with Big Data technologies. Big Data is a promising and fundamental industry that requires a large number of qualified specialists in various fields. The aim of the work is to describe the concept of determining a set of hardware, software, algorithmic and methodological tools (taking into account the contingent of students and the capabilities of the educational institution) for building a methodology for teaching a discipline related to the study of Big Data processing methods. There are two main sectors of stakeholders who need specialists in the field of Big Data. A detailed comparative analysis of software solutions that support Big Data processing is carried out. The article describes the methodology for constructing a course for teaching students technologies for processing and analyzing Big Data. A plan for organizing a lecture course and laboratory practice with consideration of subtasks is proposed for students to perform during training. The composition and methodology of independent work of students in the discipline related to the study of Big Data, using a learning management system such as Moodle, are discussed. An example of implementing data processing by means of the RapidMiner Studio package using a multi-layer neural network training algorithm using the error back propagation method is presented.

Download Full-text

Evaluating NoSQL Databases for Big Data Processing within the Brazilian Ministry of Planning, Budget, and Management

Big Data ◽

10.4018/978-1-4666-9840-6.ch050 ◽

2016 ◽

pp. 1110-1128

Author(s):

Ruben C. Huacarpuma ◽

Daniel da C. Rodrigues ◽

Antonio M. Rubio Serrano ◽

João Paulo C. Lustosa da Costa ◽

Rafael T. de Sousa Júnior ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Human Resources Management ◽

Daily Basis ◽

Integrated System ◽

Distributed Data ◽

Distributed Data Processing ◽

Data Framework ◽

Public Resources ◽

Big Data Technologies

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.

Download Full-text

Evaluating NoSQL Databases for Big Data Processing within the Brazilian Ministry of Planning, Budget, and Management

Artificial Intelligence Technologies and the Evolution of Web 3.0 - Advances in Web Technologies and Engineering ◽

10.4018/978-1-4666-8147-7.ch011 ◽

2015 ◽

pp. 230-247

Author(s):

Ruben C. Huacarpuma ◽

Daniel da C. Rodrigues ◽

Antonio M. Rubio Serrano ◽

João Paulo C. Lustosa da Costa ◽

Rafael T. de Sousa Júnior ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Human Resources Management ◽

Daily Basis ◽

Integrated System ◽

Distributed Data ◽

Distributed Data Processing ◽

Data Framework ◽

Public Resources ◽

Big Data Technologies

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.

Download Full-text