failure distribution
Recently Published Documents


TOTAL DOCUMENTS

154
(FIVE YEARS 14)

H-INDEX

18
(FIVE YEARS 1)

Author(s):  
Yongning Zhai ◽  
Weiwei Li

For the distributed computing system, excessive or deficient checkpointing operations would result in severe performance degradation. To minimize the expected computation execution of the long-running application with a general failure distribution, the optimal equidistant checkpoint interval for fault tolerant performance optimization is analyzed and derived in this paper. More precisely, the optimal checkpointing period to determine the proper checkpoint sequence is proposed, and the derivation of the expected effective rate of the defined computation cycle is introduced. Corresponding to the maximal expected effective rate, the constraint of the optimal checkpoint sequence can be obtained. From the constraint of optimality, the optimal equidistant checkpoint interval can be obtained according to the minimal fault tolerant overhead ratio. By the numerical results, the proposal is practical to determine a proper equidistant checkpoint interval for fault tolerant performance optimization.


Author(s):  
Bentolhoda Jafary ◽  
Lance Fiondella ◽  
Ping-Chen Chang

Checkpointing is a technique to back up work at periodic intervals so that if computation fails, it will not be necessary to restart from the beginning but will instead be able to restart from the latest checkpoint. Performing checkpointing operations requires time. Therefore, it is necessary to consider the tradeoff between the time to perform checkpointing operations and the time saved when computation restarts at a checkpoint. This article presents a method to model the impact of correlated failures on an application that performs a specified amount of computation and implements checkpointing operations at equidistant periods during this computation. We develop a Markov model and superimpose a correlated life distribution. Two cases are considered. The first assumes that reaching a checkpoint resets the failure distribution. The second allows the probability of failure to progress. We illustrate the approach through a series of examples. The results indicate that correlation can negatively impact checkpointing, necessitating more frequent checkpointing and increasing the total time required, but that the approach can still identify the optimal number of equidistant checkpoints, despite this correlation.


2020 ◽  
Vol 9 (2) ◽  
pp. 61-66
Author(s):  
K.V. Jayamol ◽  
K. K. Jose

In this paper we study a stochastic ordering namely alternate probability generating function (a.p.g.f .... ) ordering and its properties. The life distribution H(t) of a device subject to shocks governed by a Poisson process is considered as a function of the probabilities Pk of surviving the first k shocks. Various properties of the discrete failure distribution Pk are shown to be reflected in corresponding properties of the continuous life distribution H(t). A certain cumulative damage model and various applications of these models in reliability modeling are also considered.


Sign in / Sign up

Export Citation Format

Share Document