The Optimal Checkpoint Interval for the Long-Running Application

Author(s):  
Yongning Zhai ◽  
Weiwei Li

For the distributed computing system, excessive or deficient checkpointing operations would result in severe performance degradation. To minimize the expected computation execution of the long-running application with a general failure distribution, the optimal equidistant checkpoint interval for fault tolerant performance optimization is analyzed and derived in this paper. More precisely, the optimal checkpointing period to determine the proper checkpoint sequence is proposed, and the derivation of the expected effective rate of the defined computation cycle is introduced. Corresponding to the maximal expected effective rate, the constraint of the optimal checkpoint sequence can be obtained. From the constraint of optimality, the optimal equidistant checkpoint interval can be obtained according to the minimal fault tolerant overhead ratio. By the numerical results, the proposal is practical to determine a proper equidistant checkpoint interval for fault tolerant performance optimization.

Author(s):  
Yongning Zhai ◽  
Weiwei Li

For the distributed computing system, excessive or deficient checkpointing operations would result in severe performance degradation. To minimize the expected computation execution of the long-running application with a general failure distribution, the optimal equidistant checkpoint interval for fault tolerant performance optimization is analyzed and derived in this paper. More precisely, the optimal checkpointing period to determine the proper checkpoint sequence is proposed, and the derivation of the expected effective rate of the defined computation cycle is introduced. Corresponding to the maximal expected effective rate, the constraint of the optimal checkpoint sequence can be obtained. From the constraint of optimality, the optimal equidistant checkpoint interval can be obtained according to the minimal fault tolerant overhead ratio. By the numerical results, the proposal is practical to determine a proper equidistant checkpoint interval for fault tolerant performance optimization.


Author(s):  
Zhenyu Sun ◽  
Wei Guo ◽  
Yaohui Jin ◽  
Weiqiang Sun ◽  
Weisheng Hu

Author(s):  
Pei Yun Zhang ◽  
Yu Tong Chen ◽  
Meng Chu Zhou ◽  
Ge Xu ◽  
Wen Jun Huang ◽  
...  

Metrologia ◽  
2007 ◽  
Vol 44 (5) ◽  
pp. 319-326 ◽  
Author(s):  
T J Esward ◽  
A de Ginestous ◽  
P M Harris ◽  
I D Hill ◽  
S G R Salim ◽  
...  

2006 ◽  
Vol 152 (1-2) ◽  
pp. 190-201 ◽  
Author(s):  
Andy Müller ◽  
Hannes Osterhage ◽  
Robert Sowa ◽  
Ralph G. Andrzejak ◽  
Florian Mormann ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document