A novel proactive Health Aware Fault Tolerant (HAFT) scheduler for computational grid based on resource failure data analytics

2018 ◽  
Vol 41 (5) ◽  
pp. 367-377 ◽  
Author(s):  
A. Shamila Ebenezer ◽  
Elijah Blessing Rajsingh ◽  
Baskaran Kaliaperumal
2001 ◽  
Vol 24 (15-16) ◽  
pp. 1589-1606 ◽  
Author(s):  
H.S. Laskaridis ◽  
A.A. Veglis ◽  
G.I. Papadimitriou ◽  
A.S. Pombortsis

Author(s):  
Zahid Raza ◽  
Deo P. Vidyarthi

Grid is a parallel and distributed computing network system comprising of heterogeneous computing resources spread over multiple administrative domains that offers high throughput computing. Since the Grid operates at a large scale, there is always a possibility of failure ranging from hardware to software. The penalty paid of these failures may be on a very large scale. System needs to be tolerant to various possible failures which, in spite of many precautions, are bound to happen. Replication is a strategy often used to introduce fault tolerance in the system to ensure successful execution of the job, even when some of the computational resources fail. Though replication incurs a heavy cost, a selective degree of replication can offer a good compromise between the performance and the cost. This chapter proposes a co-scheduler that can be integrated with main scheduler for the execution of the jobs submitted to computational Grid. The main scheduler may have any performance optimization criteria; the integration of co-scheduler will be an added advantage towards fault tolerance. The chapter evaluates the performance of the co-scheduler with the main scheduler designed to minimize the turnaround time of a modular job by introducing module replication to counter the effects of node failures in a Grid. Simulation study reveals that the model works well under various conditions resulting in a graceful degradation of the scheduler’s performance with improving the overall reliability offered to the job.


Sign in / Sign up

Export Citation Format

Share Document