A Hybrid Scheduler for Many Task Computing in Big Data Systems

AbstractWith the rapid evolution of the distributed computing world in the last few years, the amount of data created and processed has fast increased to petabytes or even exabytes scale. Such huge data sets need data-intensive computing applications and impose performance requirements to the infrastructures that support them, such as high scalability, storage, fault tolerance but also efficient scheduling algorithms. This paper focuses on providing a hybrid scheduling algorithm for many task computing that addresses big data environments with few penalties, taking into consideration the deadlines and satisfying a data dependent task model. The hybrid solution consists of several heuristics and algorithms (min-min, min-max and earliest deadline first) combined in order to provide a scheduling algorithm that matches our problem. The experimental results are conducted by simulation and prove that the proposed hybrid algorithm behaves very well in terms of meeting deadlines.

Download Full-text

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Specifying Big Data Benchmarks - Lecture Notes in Computer Science ◽

10.1007/978-3-642-53974-9_5 ◽

2014 ◽

pp. 44-59 ◽

Cited By ~ 6

Author(s):

Zhen Jia ◽

Runlin Zhou ◽

Chunge Zhu ◽

Lei Wang ◽

Wanling Gao ◽

...

Keyword(s):

Big Data ◽

Data Sets ◽

Data Systems ◽

Big Data Systems

Download Full-text

ScaDS Dresden/Leipzig – A competence center for collaborative big data research

it - Information Technology ◽

10.1515/itit-2018-0026 ◽

2018 ◽

Vol 60 (5-6) ◽

pp. 327-333 ◽

Cited By ~ 1

Author(s):

René Jäkel ◽

Eric Peukert ◽

Wolfgang E. Nagel ◽

Erhard Rahm

Keyword(s):

Big Data ◽

Heterogeneous Data ◽

Data Sets ◽

Data Intensive ◽

Innovative Methods ◽

Huge Data ◽

Wide Range ◽

Resource Requirements ◽

Visualization Of Data ◽

Data Intensive Applications

Abstract The efficient and intelligent handling of large, often distributed and heterogeneous data sets increasingly determines the scientific and economic competitiveness in most application areas. Mobile applications, social networks, multimedia collections, sensor networks, data intense scientific experiments, and complex simulations nowadays generate a huge data deluge. Nonetheless, processing and analyzing these data sets with innovative methods open up new opportunities for its exploitation and new insights. Nevertheless, the resulting resource requirements exceed usually the possibilities of state-of-the-art methods for the acquisition, integration, analysis and visualization of data and are summarized under the term big data. ScaDS Dresden/Leipzig, as one Germany-wide competence center for collaborative big data research, bundles efforts to realize data-intensive applications for a wide range of applications in science and industry. In this article, we present the basic concept of the competence center and give insights in some of its research topics.

Download Full-text

A Review of Polyglot Persistence in the Big Data World

Information ◽

10.3390/info10040141 ◽

2019 ◽

Vol 10 (4) ◽

pp. 141 ◽

Cited By ~ 3

Author(s):

Pwint Phyu Khine ◽

Zhaoshun Wang

Keyword(s):

Big Data ◽

Distributed Systems ◽

Query Language ◽

Point Of View ◽

Data Systems ◽

Data Intensive ◽

Soft State ◽

Eventual Consistency ◽

Big Data Systems ◽

The Relationship

The inevitability of the relationship between big data and distributed systems is indicated by the fact that data characteristics cannot be easily handled by a standalone centric approach. Among the different concepts of distributed systems, the CAP theorem (Consistency, Availability, and Partition Tolerant) points out the prominent use of the eventual consistency property in distributed systems. This has prompted the need for other, different types of databases beyond SQL (Structured Query Language) that have properties of scalability and availability. NoSQL (Not-Only SQL) databases, mostly with the BASE (Basically Available, Soft State, and Eventual consistency), are gaining ground in the big data era, while SQL databases are left trying to keep up with this paradigm shift. However, none of these databases are perfect, as there is no model that fits all requirements of data-intensive systems. Polyglot persistence, i.e., using different databases as appropriate for the different components within a single system, is becoming prevalent in data-intensive big data systems, as they are distributed and parallel by nature. This paper reflects the characteristics of these databases from a conceptual point of view and describes a potential solution for a distributed system—the adoption of polyglot persistence in data-intensive systems in the big data era.

Download Full-text

A smart place to work? Big data systems, labour, control, and modern retail stores

10.31235/osf.io/z9dgc ◽

2017 ◽

Author(s):

Leighton Evans ◽

Rob Kitchin

Keyword(s):

Big Data ◽

Control Systems ◽

Retail Store ◽

Retail Stores ◽

Data Systems ◽

Data Intensive ◽

Governance Regime ◽

Working Practices ◽

Big Data Systems

The modern retail store is a complex coded assemblage and data-intensive environment, its operations and management mediated by a number of interlinked big data systems. This paper draws on an ethnography of a superstore in Ireland to examine how these systems modulate the functioning of the store and working practices of employees. It was found that retail work involves a continual movement between a governance regime of control reliant on big data systems which seek to regulate and harnesses formal labour and automation into enterprise planning, and a disciplinary regime that deals with the symbolic, interactive labour that workers perform and acts as a reserve mode of governmentality if control fails. This continual movement is caused by new systems of control being open to vertical and horizontal fissures. While retail functions as a coded assemblage of control, systems are too brittle to sustain the code/space and governmentality desired.

Download Full-text

A Novel Improved Grey Wolf Optimization Algorithm Based Resource Management Strategy for Big Data Systems

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2021.9383 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1227-1232

Author(s):

L. R. Aravind Babu ◽

J. Saravana Kumar

Keyword(s):

Cloud Computing ◽

Big Data ◽

Resource Management ◽

Scheduling Algorithm ◽

Data Systems ◽

Grey Wolf ◽

End User ◽

Grey Wolf Optimization ◽

Cloud Resource ◽

Big Data Systems

Presently, big data is very popular, since it finds helpful in diverse domains like social media, E-commerce transactions, etc. Cloud computing offers services on demand, broader networking access, source collection, quick flexibility and calculated services. The cloud sources are usually different and the application necessities of the end user are rapidly changing from time to time. So, the resource management is the tedious process. At the same time, resource management and scheduling plays a vital part in cloud computing (CC) results, particularly while the environment is employed in the analysis of big data, and minimum predictable workload dynamically enters into the cloud. The identification of the optimal scheduling solutions with diverse variables in varying platform still remains a crucial problem. Under cloud platform, the scheduling techniques should be able to adapt the changes quickly and according to the input workload. In this paper, an improved grey wolf optimization (IGWO) algorithm with oppositional learning principle has been important to carry out the scheduling task in an effective way. The presented IGWO based scheduling algorithm achieves optimal cloud resource usage and offers effective solution over the compared methods in a significant way.

Download Full-text