scholarly journals Performance Improvement of DAG-Aware Task Scheduling Algorithms with Efficient Cache Management in Spark

Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1874
Author(s):  
Yao Zhao ◽  
Jian Dong ◽  
Hongwei Liu ◽  
Jin Wu ◽  
Yanxin Liu

Directed acyclic graph (DAG)-aware task scheduling algorithms have been studied extensively in recent years, and these algorithms have achieved significant performance improvements in data-parallel analytic platforms. However, current DAG-aware task scheduling algorithms, among which HEFT and GRAPHENE are notable, pay little attention to the cache management policy, which plays a vital role in in-memory data-parallel systems such as Spark. Cache management policies that are designed for Spark exhibit poor performance in DAG-aware task-scheduling algorithms, which leads to cache misses and performance degradation. In this study, we propose a new cache management policy known as Long-Running Stage Set First (LSF), which makes full use of the task dependencies to optimize the cache management performance in DAG-aware scheduling algorithms. LSF calculates the caching and prefetching priorities of resilient distributed datasets according to their unprocessed workloads and significance in parallel scheduling, which are key factors in DAG-aware scheduling algorithms. Moreover, we present a cache-aware task scheduling algorithm based on LSF to reduce the resource fragmentation in computing. Experiments demonstrate that, compared to DAG-aware scheduling algorithms with LRU and MRD, the same algorithms with LSF improve the JCT by up to 42% and 30%, respectively. The proposed cache-aware scheduling algorithm also exhibits about 12% reduction in the average job completion time compared to GRAPHENE with LSF.

Author(s):  
Ahmed Subhi Abdalkafor ◽  
Khattab M. Ali Alheeti

Cloud computing plays an important role in our daily life. It has direct and positive impact on share and update data, knowledge, storage and scientific resources between various regions. Cloud computing performance heavily based on job scheduling algorithms that are utilized for queue waiting in modern scientific applications. The researchers are considered cloud computing a popular platform for new enforcements. These scheduling algorithms help in design efficient queue lists in cloud as well as they play vital role in reducing waiting for processing time in cloud computing. A novel job scheduling is proposed in this paper to enhance performance of cloud computing and reduce delay time in queue waiting for jobs. The proposed algorithm tries to avoid some significant challenges that throttle from developing applications of cloud computing. However, a smart scheduling technique is proposed in our paper to improve performance processing in cloud applications. Our experimental result of the proposed job scheduling algorithm shows that the proposed schemes possess outstanding enhancing rates with a reduction in waiting time for jobs in queue list.


2019 ◽  
Vol 29 (02) ◽  
pp. 2050029 ◽  
Author(s):  
Chi-Chou Kao

In this paper, we propose a resource/performance tradeoff algorithm for task scheduling of parallel reconfigurable architectures. First, it uses unlimited resources to generate an optimal scheduling algorithm. Then, a relaxation algorithm is applied to satisfy the number of resources under increasing minimum performance. To demonstrate the performance of the proposed algorithm, we not only compare the existing methods with standard benchmarks but also implement on physical systems. The experimental results show that the proposed algorithms satisfy the requirements of the systems with limited resources.


2020 ◽  
Vol 13 (3) ◽  
pp. 326-335
Author(s):  
Punit Gupta ◽  
Ujjwal Goyal ◽  
Vaishali Verma

Background: Cloud Computing is a growing industry for secure and low cost pay per use resources. Efficient resource allocation is the challenging issue in cloud computing environment. Many task scheduling algorithms used to improve the performance of system. It includes ant colony, genetic algorithm & Round Robin improve the performance but these are not cost efficient at the same time. Objective: In early proven task scheduling algorithms network cost are not included but in this proposed ACO network overhead or cost is taken into consideration which thus improves the efficiency of the algorithm as compared to the previous algorithm. Proposed algorithm aims to improve in term of cost and execution time and reduces network cost. Methods: The proposed task scheduling algorithm in cloud uses ACO with network cost and execution cost as a fitness function. This work tries to improve the existing ACO that will give improved result in terms of performance and execution cost for cloud architecture. Our study includes a comparison between various other algorithms with our proposed ACO model. Results: Performance is measured using an optimization criteria tasks completion time and resource operational cost in the duration of execution. The network cost and user requests measures the performance of the proposed model. Conclusion: The simulation shows that the proposed cost and time aware technique outperforms using performance measurement parameters (average finish time, resource cost, network cost).


2016 ◽  
Vol 26 (01) ◽  
pp. 1650002 ◽  
Author(s):  
Abhishek Mishra ◽  
Pramod Kumar Mishra

The LOCAL(A, B) randomized task scheduling algorithm is proposed for fully connected multiprocessors. It combines two given task scheduling algorithms (A, and B) using local neighborhood search to give a hybrid of the two given algorithms. Objective is to show that such type of hybridization can give much better performance results in terms of parallel execution times. Two task scheduling algorithms are selected: DSC (Dominant Sequence Clustering as algorithm A), and CPPS (Cluster Pair Priority Scheduling as algorithm B) and a hybrid is created (the LOCAL(DSC, CPPS) or simply the LOCAL task scheduling algorithm). The LOCAL task scheduling algorithm has time complexity O(|V||E|(|V |+|E|)), where V is the set of vertices, and E is the set of edges in the task graph. The LOCAL task scheduling algorithm is compared with six other algorithms: CPPS, DCCL (Dynamic Computation Communication Load), DSC, EZ (Edge Zeroing), LC (Linear Clustering), and RDCC (Randomized Dynamic Computation Communication). Performance evaluation of the LOCAL task scheduling algorithm shows that it gives up to 80.47 % improvement of NSL (Normalized Schedule Length) over other algorithms.


Author(s):  
Rajkumar Rajavel ◽  
Sathish Kumar Ravichandran ◽  
Partheeban Nagappan ◽  
Sivakumar Venu

Maintaining the quality of service (QoS) related parameters is an important issue in cloud management systems. The lack of such QoS parameters discourages cloud users from using the services of cloud service providers. The proposed task scheduling algorithms consider QoS parameters such as the latency, make-span, and load balancing to satisfy the user requirements. These parameters cannot sufficiently guarantee the desired user experience or that a task will be completed within a predetermined time. Therefore, this study considered the cost-enabled QoS-aware task (job) scheduling algorithm to enhance user satisfaction and maximize the profit of commercial cloud providers. The proposed scheduling algorithm estimates the cost-enabled QoS metrics of the virtual resources available from the unified resource layer in real-time. Moreover, the virtual machine (VM) manager frequently updates the current state-of-the art information about resources in the proposed scheduler to make appropriate decisions. Hence, the proposed approach guarantees profit for cloud providers in addition to providing QoS parameters such as make-span, cloud utilization, and cloud utility, as demonstrated through a comparison with existing time-and cost-based task scheduling algorithms.


2016 ◽  
Vol 25 (10) ◽  
pp. 1650119 ◽  
Author(s):  
Bahman Keshanchi ◽  
Nima Jafari Navimipour

Task scheduling is one of the major issues to achieve high performance in distributed systems such as Grid, Peer-to-Peer and cloud environment. Generally, there are two phases in heuristics-based task scheduling algorithms in heterogeneous distributed computing systems (HeDCSs). These phases are task prioritization and processor assigning respectively. Heuristic-based task scheduling algorithms may use different policies to assign priority to subtasks which produce different makespans in a heterogeneous computing system. Thus, a suitable scheduling algorithm is one that can efficiently assign a priority to tasks in order to minimize makespan. Recently, memetic algorithms (MAs) have been used as evolutionary or population-based global search approaches with local search heuristic to optimize NP-complete problems. Recent studies on MAs have discovered their success on a wide variety of real-world problems. Since the task scheduling problem is an NP-complete, in this paper, a new task scheduling algorithm on cloud environment using multiple priority queues and a memetic algorithm (MPQMA) is proposed. The proposed method uses a genetic algorithm (GA) along with hill climbing to assign a priority to each subtask while using a heuristic-based earliest finish time (EFT) approach to search for a solution for the task-to-processor mapping. The basic idea of our approach is using the advantage of MA to increase the convergence speed of the solutions. We implemented the algorithm on Azure Cloud Service by C# language where the experimental results for the set of randomly generated graphs revealed that the proposed MPQMA algorithm outperformed the existing three task scheduling algorithms in terms of makespan with fast convergence to the optimized solution.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Guan Wang ◽  
Yuxin Wang ◽  
Hui Liu ◽  
He Guo

High-performance heterogeneous computing systems are achieved by the use of efficient application scheduling algorithms. However, most of the current algorithms have low efficiency in scheduling. Aiming at solving this problem, we propose a novel task scheduling algorithm for heterogeneous computing named HSIP (heterogeneous scheduling algorithm with improved task priority) whose functionality relies on three pillars: (1) an improved task priority strategy based on standard deviation with improved magnitude as computation weight and communication cost weight to make scheduling priority more reasonable; (2) an entry task duplication selection policy to make the makespan shorter; and (3) an improved idle time slots (ITS) insertion-based optimizing policy to make the task scheduling more efficient. We evaluate our proposed algorithm on randomly generated DAGs, using some real application DAGs by comparison with some classical scheduling algorithms. According to the experimental results, our proposed algorithm appears to perform better than other algorithms in terms of schedule length ratio, efficiency, and frequency of best results.


2017 ◽  
Vol 10 (1) ◽  
pp. 194-200 ◽  
Author(s):  
Sneha Sneha ◽  
Shoney Sebastian

Traditional way of storing such a huge amount of data is not convenient because processing those data in the later stages is very tedious job. So nowadays, Hadoop is used to store and process large amount of data. When we look at the statistics of data generated in the recent years it is very high in the last 2 years. Hadoop is a good framework to store and process data efficiently. It works like parallel processing and there is no failure or data loss as such due to fault tolerance. Job scheduling is an important process in Hadoop Map Reduce. Hadoop comes with three types of schedulers namely FIFO (First in first out), Fair and Capacity Scheduler. The schedulers are now a pluggable component in the Hadoop Map Reduce framework. This paper talks about the native job scheduling algorithms in Hadoop. Fair scheduling algorithm is analysed with its algorithm considering its response time, throughput and performance. Advantages and drawbacks of fair scheduling algorithm is discussed. Improvised fair scheduling algorithm is proposed with new strategy. Analysis is made with respect to response time, throughput and performance is calculated in naive fair scheduling and improvised fair scheduling. Improvised fair Scheduling algorithms is used in the cases where there is jobs with high and less processing time.


2020 ◽  
Vol 3 (4) ◽  
pp. 47-59
Author(s):  
Ahmed A. Hamed ◽  
Rabah A. Ahmed

The importance of hybrid cloud computing has become a reality in recent years for large and medium enterprises and even at the individual level, which increases the need for many improvements in its availability level. One of the most important things that affects availability is the task scheduling process. Task scheduling is subject to many scheduling algorithms and these algorithms differ in terms of performance and purpose, the most important aspects being improved by using an appropriate scheduling algorithm is the total execution time(makespane) and also the success rate and downtime live migration. Because working on a cloud computing environment is costly and complex, we have simulated a hybrid cloud environment using reliable and accurate simulation and used Directed acyclic graph(DAG) as a workflow application. In this paper we will compare scheduling and planning algorithms for cloud computing environment by implementing a framework using (workflowsim) based on (cloudsim) simulator in order to choose the best algorithm to verify the possibility of improving availability.


2015 ◽  
Vol 14 (8) ◽  
pp. 5960-5966 ◽  
Author(s):  
Lalla Singh ◽  
Neha Agarwal

Grid computing is hardware and software infrastructure which offers a economical, distributable, coordinated and credible access to strong computational abilities [1]. For optimal use of the abilities of large distributed systems, necessitate for successful and proficient scheduling algorithms is enforced. For diminution of total completion time and improvement of load balancing, many algorithms have been executed. In this paper, our goal is to propose new scheduling algorithm based on well known task scheduling algorithm i.e. Min-Min[1]. The proposed algorithm tries to use the advantages of this basic algorithm and excludes its drawbacks with better grid utilization and minimized makespan. In comparison to existing algorithms like Min-Min and improved Min-Min algorithm[1], our proposed algorithm is achieving better results for considered parameters.


Sign in / Sign up

Export Citation Format

Share Document