Job Scheduling in Big Data-A Survey

Author(s):  
Seethalakshmi V ◽  
Govindasamy V ◽  
Akila V
Keyword(s):  
Big Data ◽  
2016 ◽  
Vol 16 (3) ◽  
pp. 35-51 ◽  
Author(s):  
M. Senthilkumar ◽  
P. Ilango

Abstract Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing. Hadoop is also open source software that allows the user to effectively utilize the hardware. Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time. This paper gives the outline of job scheduling, classification of the scheduler, and comparison of different existing algorithms with advantages, drawbacks, limitations. In this paper, we discussed various tools and frameworks used for monitoring and the ways to improve the performance in MapReduce. This paper helps the beginners and researchers in understanding the scheduling mechanisms used in Big Data.


2015 ◽  
Vol 130 (13) ◽  
pp. 41-49 ◽  
Author(s):  
Sreedhar C. ◽  
N. Kasiviswanath ◽  
P. Chenna

Author(s):  
N. Deshai ◽  
B. V. D. S. Sekhar ◽  
S. Venkataramana ◽  
K. Srinivas ◽  
G. P. S. Varma

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Bao Rong Chang ◽  
Yun-Da Lee ◽  
Po-Hao Liao

The crucial problem of the integration of multiple platforms is how to adapt for their own computing features so as to execute the assignments most efficiently and gain the best outcome. This paper introduced the new approaches to big data platform, RHhadoop and SparkR, and integrated them to form a high-performance big data analytics with multiple platforms as part of business intelligence (BI) to carry out rapid data retrieval and analytics with R programming. This paper aims to develop the optimization for job scheduling using MSHEFT algorithm and implement the optimized platform selection based on computing features for improving the system throughput significantly. In addition, users would simply give R commands rather than run Java or Scala program to perform the data retrieval and analytics in the proposed platforms. As a result, according to performance index calculated for various methods, although the optimized platform selection can reduce the execution time for the data retrieval and analytics significantly, furthermore scheduling optimization definitely increases the system efficiency a lot.


This paper studies about various job scheduling methodologies used in big data systems. Map reduce is a highly efficient distributed job processing strategy for big data systems. Job scheduling is a critical task of any big data system as the volume of jobs need to be processed is tremendous. This study will go over the map reduce process in detail. It also reviews various job scheduling methodologies and tries to perform an efficient comparison among these methodologies.


Sign in / Sign up

Export Citation Format

Share Document