Evaluating NoSQL Databases for Big Data Processing within the Brazilian Ministry of Planning, Budget, and Management

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.

Download Full-text

Evaluating NoSQL Databases for Big Data Processing within the Brazilian Ministry of Planning, Budget, and Management

Big Data ◽

10.4018/978-1-4666-9840-6.ch050 ◽

2016 ◽

pp. 1110-1128

Author(s):

Ruben C. Huacarpuma ◽

Daniel da C. Rodrigues ◽

Antonio M. Rubio Serrano ◽

João Paulo C. Lustosa da Costa ◽

Rafael T. de Sousa Júnior ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Human Resources Management ◽

Daily Basis ◽

Integrated System ◽

Distributed Data ◽

Distributed Data Processing ◽

Data Framework ◽

Public Resources ◽

Big Data Technologies

Download Full-text

Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

Information Management and Big Data - Communications in Computer and Information Science ◽

10.1007/978-3-030-11680-4_13 ◽

2019 ◽

pp. 121-128

Author(s):

Gusseppe Bravo-Rocca ◽

Piero Torres-Robatty ◽

Jose Fiestas-Iquira

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Processing System ◽

Distributed Data ◽

Data Processing System ◽

Distributed Data Processing ◽

Automated Machine Learning

Download Full-text

The Berlin Big Data Center (BBDC)

it - Information Technology ◽

10.1515/itit-2018-0016 ◽

2018 ◽

Vol 60 (5-6) ◽

pp. 321-326 ◽

Cited By ~ 1

Author(s):

Christoph Boden ◽

Tilmann Rabl ◽

Volker Markl

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Processing ◽

Deep Understanding ◽

Automatic Parallelization ◽

Second Phase ◽

Distributed Data ◽

Domain Specific ◽

Distributed Data Processing ◽

Large Groups

Abstract The last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.

Download Full-text