A Review of Polyglot Persistence in the Big Data World

The inevitability of the relationship between big data and distributed systems is indicated by the fact that data characteristics cannot be easily handled by a standalone centric approach. Among the different concepts of distributed systems, the CAP theorem (Consistency, Availability, and Partition Tolerant) points out the prominent use of the eventual consistency property in distributed systems. This has prompted the need for other, different types of databases beyond SQL (Structured Query Language) that have properties of scalability and availability. NoSQL (Not-Only SQL) databases, mostly with the BASE (Basically Available, Soft State, and Eventual consistency), are gaining ground in the big data era, while SQL databases are left trying to keep up with this paradigm shift. However, none of these databases are perfect, as there is no model that fits all requirements of data-intensive systems. Polyglot persistence, i.e., using different databases as appropriate for the different components within a single system, is becoming prevalent in data-intensive big data systems, as they are distributed and parallel by nature. This paper reflects the characteristics of these databases from a conceptual point of view and describes a potential solution for a distributed system—the adoption of polyglot persistence in data-intensive systems in the big data era.

Download Full-text

A smart place to work? Big data systems, labour, control, and modern retail stores

10.31235/osf.io/z9dgc ◽

2017 ◽

Author(s):

Leighton Evans ◽

Rob Kitchin

Keyword(s):

Big Data ◽

Control Systems ◽

Retail Store ◽

Retail Stores ◽

Data Systems ◽

Data Intensive ◽

Governance Regime ◽

Working Practices ◽

Big Data Systems

The modern retail store is a complex coded assemblage and data-intensive environment, its operations and management mediated by a number of interlinked big data systems. This paper draws on an ethnography of a superstore in Ireland to examine how these systems modulate the functioning of the store and working practices of employees. It was found that retail work involves a continual movement between a governance regime of control reliant on big data systems which seek to regulate and harnesses formal labour and automation into enterprise planning, and a disciplinary regime that deals with the symbolic, interactive labour that workers perform and acts as a reserve mode of governmentality if control fails. This continual movement is caused by new systems of control being open to vertical and horizontal fissures. While retail functions as a coded assemblage of control, systems are too brittle to sustain the code/space and governmentality desired.

Download Full-text

High-Level Languages for Geospatial Analysis of Big Data

Interdisciplinary Approaches to Spatial Optimization Issues - Advances in Geospatial Technologies ◽

10.4018/978-1-7998-1954-7.ch004 ◽

2021 ◽

pp. 62-81

Author(s):

Symphorien Monsia ◽

Sami Faiz

Keyword(s):

Big Data ◽

Query Language ◽

Research Work ◽

Temporal Data ◽

Data Systems ◽

High Level Language ◽

Gis Tools ◽

Spatio Temporal ◽

Big Data Systems ◽

High Level

In recent years, big data has become a major concern for many organizations. An essential component of big data is the spatio-temporal data dimension known as geospatial big data, which designates the application of big data issues to geographic data. One of the major aspects of the (geospatial) big data systems is the data query language (i.e., high-level language) that allows non-technical users to easily interact with these systems. In this chapter, the researchers explore high-level languages focusing in particular on the spatial extensions of Hadoop for geospatial big data queries. Their main objective is to examine three open source and popular implementations of SQL on Hadoop intended for the interrogation of geospatial big data: (1) Pigeon of SpatialHadoop, (2) QLSP of Hadoop-GIS, and (3) ESRI Hive of GIS Tools for Hadoop. Along the same line, the authors present their current research work toward the analysis of geospatial big data.

Download Full-text

A Hybrid Scheduler for Many Task Computing in Big Data Systems

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0027 ◽

2017 ◽

Vol 27 (2) ◽

pp. 385-399 ◽

Cited By ~ 5

Author(s):

Laura Vasiliu ◽

Florin Pop ◽

Catalin Negru ◽

Mariana Mocanu ◽

Valentin Cristea ◽

...

Keyword(s):

Big Data ◽

Scheduling Algorithm ◽

Rapid Evolution ◽

Data Sets ◽

Data Systems ◽

Data Intensive ◽

Performance Requirements ◽

Huge Data ◽

Hybrid Solution ◽

Big Data Systems

AbstractWith the rapid evolution of the distributed computing world in the last few years, the amount of data created and processed has fast increased to petabytes or even exabytes scale. Such huge data sets need data-intensive computing applications and impose performance requirements to the infrastructures that support them, such as high scalability, storage, fault tolerance but also efficient scheduling algorithms. This paper focuses on providing a hybrid scheduling algorithm for many task computing that addresses big data environments with few penalties, taking into consideration the deadlines and satisfying a data dependent task model. The hybrid solution consists of several heuristics and algorithms (min-min, min-max and earliest deadline first) combined in order to provide a scheduling algorithm that matches our problem. The experimental results are conducted by simulation and prove that the proposed hybrid algorithm behaves very well in terms of meeting deadlines.

Download Full-text