Fault-tolerance and availability awareness in computational grids

Author(s):  
Xavier Besseron ◽  
Mohamed-Slim Bouguerra ◽  
Thierry Gautier ◽  
Erik Saule ◽  
Denis Trystram
2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
P. Keerthika ◽  
N. Kasthuri

Problem Statement. The advances in human civilization lead to more complications in problem solving. Grid computing serves as an efficient technology in solving those complicated problems. In computational grids, the grid scheduler schedules the task and finds the appropriate resource for each task. The scheduler must consider several factors such as user demand, communication time, failure handling mechanisms, and reduced makespan. Most of the existing algorithms do not consider user satisfaction. Thus a scheduling algorithm that handles failure of resources and achieves user satisfaction gains more importance.Approach. A new bicriteria scheduling algorithm (BSA) that considers user satisfaction along with fault tolerance has been introduced. The main contribution of this paper includes achieving user satisfaction along with fault tolerance and minimizing the makespan of jobs.Results. The performance of this proposed algorithm is evaluated using GridSim based on makespan and number of jobs completed successfully within user deadline.Conclusions/Recommendations. The proposed BSA algorithm achieves reduced makespan and better hit rate with higher user satisfaction and fault tolerance.


2013 ◽  
Vol 3 (1) ◽  
Author(s):  
Mohammed Amoon

AbstractFault tolerance is an important property in computational grids since the resources are geographically distributed. Job checkpointing is one of the most common utilized techniques for providing fault tolerance in computational grids. The efficiency of checkpointing depends on the choice of the checkpoint interval. Inappropriate checkpointing interval can delay job execution. In this paper, a fault-tolerant scheduling system based on checkpointing technique is presented and evaluated. When scheduling a job, the system uses both average failure time and failure rate of grid resources combined with resources response time to generate scheduling decisions. The system uses the failure rate of the assigned resources to calculate the checkpoint interval for each job. Extensive simulation experiments are conducted to quantify the performance of the proposed system. Experiments have shown that the proposed system can considerably improve throughput, turnaround time, grid load and failure tendency of computational grids.


Author(s):  
MALARVIZHI NANDAGOPAL ◽  
S. GAJALAKSHMI ◽  
V. RHYMEND UTHARIARAJ

Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism with minimum total time to release (MTTR) job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the response time by selecting a computational resource based on job requirements, job characteristics, and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Globus ToolKit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and Network Weather Service are used to gather hardware and network details, respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault-tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.


Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1410
Author(s):  
Murad B. Khorsheed ◽  
Qasim M. Zainel ◽  
Oday A. Hassen ◽  
Saad M. Darwish

This paper applies the entropy-based fractal indexing scheme that enables the grid environment for fast indexing and querying. It addresses the issue of fault tolerance and load balancing-based fractal management to make computational grids more effective and reliable. A fractal dimension of a cloud of points gives an estimate of the intrinsic dimensionality of the data in that space. The main drawback of this technique is the long computing time. The main contribution of the suggested work is to investigate the effect of fractal transform by adding R-tree index structure-based entropy to existing grid computing models to obtain a balanced infrastructure with minimal fault. In this regard, the presented work is going to extend the commonly scheduling algorithms that are built based on the physical grid structure to a reduced logical network. The objective of this logical network is to reduce the searching in the grid paths according to arrival time rate and path’s bandwidth with respect to load balance and fault tolerance, respectively. Furthermore, an optimization searching technique is utilized to enhance the grid performance by investigating the optimum number of nodes extracted from the logical grid. The experimental results indicated that the proposed model has better execution time, throughput, makespan, latency, load balancing, and success rate.


Sign in / Sign up

Export Citation Format

Share Document