scholarly journals A Review of Polyglot Persistence in the Big Data World

Information ◽  
2019 ◽  
Vol 10 (4) ◽  
pp. 141 ◽  
Author(s):  
Pwint Phyu Khine ◽  
Zhaoshun Wang

The inevitability of the relationship between big data and distributed systems is indicated by the fact that data characteristics cannot be easily handled by a standalone centric approach. Among the different concepts of distributed systems, the CAP theorem (Consistency, Availability, and Partition Tolerant) points out the prominent use of the eventual consistency property in distributed systems. This has prompted the need for other, different types of databases beyond SQL (Structured Query Language) that have properties of scalability and availability. NoSQL (Not-Only SQL) databases, mostly with the BASE (Basically Available, Soft State, and Eventual consistency), are gaining ground in the big data era, while SQL databases are left trying to keep up with this paradigm shift. However, none of these databases are perfect, as there is no model that fits all requirements of data-intensive systems. Polyglot persistence, i.e., using different databases as appropriate for the different components within a single system, is becoming prevalent in data-intensive big data systems, as they are distributed and parallel by nature. This paper reflects the characteristics of these databases from a conceptual point of view and describes a potential solution for a distributed system—the adoption of polyglot persistence in data-intensive systems in the big data era.

2017 ◽  
Author(s):  
Leighton Evans ◽  
Rob Kitchin

The modern retail store is a complex coded assemblage and data-intensive environment, its operations and management mediated by a number of interlinked big data systems. This paper draws on an ethnography of a superstore in Ireland to examine how these systems modulate the functioning of the store and working practices of employees. It was found that retail work involves a continual movement between a governance regime of control reliant on big data systems which seek to regulate and harnesses formal labour and automation into enterprise planning, and a disciplinary regime that deals with the symbolic, interactive labour that workers perform and acts as a reserve mode of governmentality if control fails. This continual movement is caused by new systems of control being open to vertical and horizontal fissures. While retail functions as a coded assemblage of control, systems are too brittle to sustain the code/space and governmentality desired.


Author(s):  
Symphorien Monsia ◽  
Sami Faiz

In recent years, big data has become a major concern for many organizations. An essential component of big data is the spatio-temporal data dimension known as geospatial big data, which designates the application of big data issues to geographic data. One of the major aspects of the (geospatial) big data systems is the data query language (i.e., high-level language) that allows non-technical users to easily interact with these systems. In this chapter, the researchers explore high-level languages focusing in particular on the spatial extensions of Hadoop for geospatial big data queries. Their main objective is to examine three open source and popular implementations of SQL on Hadoop intended for the interrogation of geospatial big data: (1) Pigeon of SpatialHadoop, (2) QLSP of Hadoop-GIS, and (3) ESRI Hive of GIS Tools for Hadoop. Along the same line, the authors present their current research work toward the analysis of geospatial big data.


2017 ◽  
Vol 27 (2) ◽  
pp. 385-399 ◽  
Author(s):  
Laura Vasiliu ◽  
Florin Pop ◽  
Catalin Negru ◽  
Mariana Mocanu ◽  
Valentin Cristea ◽  
...  

AbstractWith the rapid evolution of the distributed computing world in the last few years, the amount of data created and processed has fast increased to petabytes or even exabytes scale. Such huge data sets need data-intensive computing applications and impose performance requirements to the infrastructures that support them, such as high scalability, storage, fault tolerance but also efficient scheduling algorithms. This paper focuses on providing a hybrid scheduling algorithm for many task computing that addresses big data environments with few penalties, taking into consideration the deadlines and satisfying a data dependent task model. The hybrid solution consists of several heuristics and algorithms (min-min, min-max and earliest deadline first) combined in order to provide a scheduling algorithm that matches our problem. The experimental results are conducted by simulation and prove that the proposed hybrid algorithm behaves very well in terms of meeting deadlines.


Sign in / Sign up

Export Citation Format

Share Document