scholarly journals Supporting Fault-Tolerance for Time-Critical Events in Distributed Environments

2010 ◽  
Vol 18 (1) ◽  
pp. 51-76
Author(s):  
Qian Zhu ◽  
Gagan Agrawal

In this paper, we consider the problem of supporting fault tolerance foradaptiveandtime-criticalapplications in heterogeneous and unreliable grid computing environments. Our goal for this class of applications is to optimize a user-specifiedbenefit functionwhile meeting the time deadline. Our first contribution in this paper is a multi-objective optimization algorithm for scheduling the application onto the most efficient and reliable resources. In this way, the processing can achieve the maximum benefit while also maximizing thesuccess-rate, which is the probability of finishing execution without failures. However, for the cases where failures do occur, we have developed ahybrid failure recoveryscheme to ensure that the application can complete within the pre-specified time interval. Our experimental results show that our scheduling algorithm can achieve better benefit when compared to several heuristics-based greedy scheduling algorithms, while still having a negligible overhead. Benefit is further improved when we apply the hybrid failure recovery scheme, and the success-rate becomes 100%.

2018 ◽  
Vol 8 (3) ◽  
pp. 20-31 ◽  
Author(s):  
Sam Goundar ◽  
Akashdeep Bhardwaj

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.


2004 ◽  
Vol 50 (4) ◽  
pp. 169-175 ◽  
Author(s):  
Chong Won Park ◽  
Jin-Won Park

2019 ◽  
Vol 5 (1) ◽  
pp. 65-79
Author(s):  
Yunhong Ji ◽  
Yunpeng Chai ◽  
Xuan Zhou ◽  
Lipeng Ren ◽  
Yajie Qin

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4238
Author(s):  
Yating Qu ◽  
Guoqiang Zheng ◽  
Honghai Wu ◽  
Baofeng Ji ◽  
Huahong Ma

Wireless body area networks will inevitably bring tremendous convenience to human society in future development, and also enable people to benefit from ubiquitous technological services. However, one of the reasons hindering development is the limited energy of the network nodes. Therefore, the energy consumption in the selection of the next hop must be minimized in multi-hop routing. To solve this problem, this paper proposes an energy efficient routing protocol for reliable data transmission in a wireless body area network. The protocol takes multiple parameters of the network node into account, such as residual energy, transmission efficiency, available bandwidth, and the number of hops to the sink. We construct the maximum benefit function to select the next hop node by normalizing the node parameters, and dynamically select the node with the largest function value as the next hop node. Based on the above work, the proposed method can achieve efficient multi-hop routing transmission of data and improve the reliability of network data transmission. Compared with the priority-based energy-efficient routing algorithm (PERA) and modified new-attempt routing protocol (NEW-ATTEMPT), the simulation results show that the proposed routing protocol uses the maximum benefit function to select the next hop node dynamically, which not only improves the reliability of data transmission, but also significantly improves the energy utilization efficiency of the node and prolongs the network lifetime.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
P. Keerthika ◽  
N. Kasthuri

Problem Statement. The advances in human civilization lead to more complications in problem solving. Grid computing serves as an efficient technology in solving those complicated problems. In computational grids, the grid scheduler schedules the task and finds the appropriate resource for each task. The scheduler must consider several factors such as user demand, communication time, failure handling mechanisms, and reduced makespan. Most of the existing algorithms do not consider user satisfaction. Thus a scheduling algorithm that handles failure of resources and achieves user satisfaction gains more importance.Approach. A new bicriteria scheduling algorithm (BSA) that considers user satisfaction along with fault tolerance has been introduced. The main contribution of this paper includes achieving user satisfaction along with fault tolerance and minimizing the makespan of jobs.Results. The performance of this proposed algorithm is evaluated using GridSim based on makespan and number of jobs completed successfully within user deadline.Conclusions/Recommendations. The proposed BSA algorithm achieves reduced makespan and better hit rate with higher user satisfaction and fault tolerance.


Sign in / Sign up

Export Citation Format

Share Document