Survay on Job Scheduling, Load Balancing and Fault Tolerance Techniques for Computational Grids

Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism with minimum total time to release (MTTR) job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the response time by selecting a computational resource based on job requirements, job characteristics, and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Globus ToolKit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and Network Weather Service are used to gather hardware and network details, respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault-tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.

Download Full-text

The Application of Fractal Transform and Entropy for Improving Fault Tolerance and Load Balancing in Grid Computing Environments

Entropy ◽

10.3390/e22121410 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1410

Author(s):

Murad B. Khorsheed ◽

Qasim M. Zainel ◽

Oday A. Hassen ◽

Saad M. Darwish

Keyword(s):

Fault Tolerance ◽

Load Balancing ◽

Grid Computing ◽

Computing Time ◽

Index Structure ◽

Computational Grids ◽

Optimum Number ◽

Logical Network ◽

Proposed Model ◽

Cloud Of Points

This paper applies the entropy-based fractal indexing scheme that enables the grid environment for fast indexing and querying. It addresses the issue of fault tolerance and load balancing-based fractal management to make computational grids more effective and reliable. A fractal dimension of a cloud of points gives an estimate of the intrinsic dimensionality of the data in that space. The main drawback of this technique is the long computing time. The main contribution of the suggested work is to investigate the effect of fractal transform by adding R-tree index structure-based entropy to existing grid computing models to obtain a balanced infrastructure with minimal fault. In this regard, the presented work is going to extend the commonly scheduling algorithms that are built based on the physical grid structure to a reduced logical network. The objective of this logical network is to reduce the searching in the grid paths according to arrival time rate and path’s bandwidth with respect to load balance and fault tolerance, respectively. Furthermore, an optimization searching technique is utilized to enhance the grid performance by investigating the optimum number of nodes extracted from the logical grid. The experimental results indicated that the proposed model has better execution time, throughput, makespan, latency, load balancing, and success rate.

Download Full-text

A Hybrid Policy for Job Scheduling and Load Balancing in Heterogeneous Computational Grids

Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07) ◽

10.1109/ispdc.2007.4 ◽

2007 ◽

Cited By ~ 15

Author(s):

Kai Lu ◽

Albert Y. Zomaya

Keyword(s):

Load Balancing ◽

Job Scheduling ◽

Computational Grids

Download Full-text

A Review on Load Balancing Model Using Best Partition Technique

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.69 ◽

2017 ◽

Vol 7 (8) ◽

pp. 284

Author(s):

M. Chaitanya ◽

K. Durga Charan

Keyword(s):

Cloud Computing ◽

Fault Tolerance ◽

Load Balancing ◽

Load Balance ◽

Large Impact ◽

Cloud Environment ◽

Public Cloud ◽

The Public ◽

Partition Technique ◽

Textual Content

Load balancing makes cloud computing greater knowledgeable and could increase client pleasure. At reward cloud computing is among the all most systems which offer garage of expertise in very lowers charge and available all the time over the net. However, it has extra vital hassle like security, load administration and fault tolerance. Load balancing inside the cloud computing surroundings has a large impact at the presentation. The set of regulations relates the sport idea to the load balancing manner to amplify the abilties in the public cloud environment. This textual content pronounces an extended load balance mannequin for the majority cloud concentrated on the cloud segregating proposal with a swap mechanism to select specific strategies for great occasions.

Download Full-text

Aspect-Oriented Programing Techniques to support Distribution, Fault Tolerance, and Load Balancing in the CORBA-LC Component Model

Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007) ◽

10.1109/nca.2007.8 ◽

2007 ◽

Cited By ~ 1

Author(s):

Diego Sevilla ◽

Jose M. Garcia ◽

Antonio Gomez

Keyword(s):

Fault Tolerance ◽

Load Balancing ◽

Component Model

Download Full-text

Fault-tolerance and availability awareness in computational grids

Chapman & Hall/CRC Numerical Analy & Scient Comp. Series - Fundamentals of Grid Computing ◽

10.1201/9781439803684-c6 ◽

2009 ◽

pp. 143-175 ◽

Cited By ~ 2

Author(s):

Xavier Besseron ◽

Mohamed-Slim Bouguerra ◽

Thierry Gautier ◽

Erik Saule ◽

Denis Trystram

Keyword(s):

Fault Tolerance ◽

Computational Grids

Download Full-text

Factors Affecting Fault Tolerance during Load Balancing in Cloud Computing

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i11.6079 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1523-1533

Author(s):

Bidush Kumar Sahoo , Et. al.

Keyword(s):

Cloud Computing ◽

Fault Tolerance ◽

Load Balancing ◽

South India ◽

Analysis Tool ◽

Factors Affecting ◽

Software Firms ◽

Statistical Analysis Tool ◽

Cost Efficient ◽

Support Cost

Cloud computing is built upon the advancement of virtualization and distributed computing to support cost-efficient usage of computing resources and to provide on demand services. After methodical analysis on various factors affecting fault tolerance during load balancing is performed and it is concluded that the factors influencing fault tolerance in load balancing are cloud security, adaptability etc. in comparatively more software firms. In this paper, we have created a model for various IT industries for checking the fault tolerance during Load balancing. An exploration is done with the help of some renowned IT farms and industries in South India. This work consists of 20 hypotheses which may affect the fault tolerance during load balancing in South India. It is verified by using potential statistical analysis tool i.e. Statistical Package for Social Science (SPSS).

Download Full-text

Peer-to-Peer Service Discovery for Grid Computing

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch111 ◽

2012 ◽

pp. 232-259

Author(s):

Eddy Caron ◽

Frédéric Desprez ◽

Franck Petit ◽

Cédric Tedeschi

Keyword(s):

Fault Tolerance ◽

Load Balancing ◽

Service Discovery ◽

Large Scale ◽

Dynamic Networks ◽

Peer To Peer ◽

Indexing System ◽

Key Points ◽

Computing Platforms ◽

Traditional Approaches

Within distributed computing platforms, some computing abilities (or services) are offered to clients. To build dynamic applications using such services as basic blocks, a critical prerequisite is to discover those services. Traditional approaches to the service discovery problem have historically relied upon centralized solutions, unable to scale well in large unreliable platforms. In this chapter, we will first give an overview of the state of the art of service discovery solutions based on peer-to-peer (P2P) technologies that allow such a functionality to remain efficient at large scale. We then focus on one of these approaches: the Distributed Lexicographic Placement Table (DLPT) architecture, that provide particular mechanisms for load balancing and fault-tolerance. This solution centers around three key points. First, it calls upon an indexing system structured as a prefix tree, allowing multi-attribute range queries. Second, it allows the mapping of such structures onto heterogeneous and dynamic networks and proposes some load balancing heuristics for it. Third, as our target platform is dynamic and unreliable, we describe its powerful fault-tolerance mechanisms, based on self-stabilization. Finally, we present the software prototype of this architecture and its early experiments.

Download Full-text