Symbolic synthesis of masking fault-tolerant distributed programs

2011 ◽  
Vol 25 (1) ◽  
pp. 83-108 ◽  
Author(s):  
Borzoo Bonakdarpour ◽  
Sandeep S. Kulkarni ◽  
Fuad Abujarad
1987 ◽  
Vol 8 (1) ◽  
pp. 43-67 ◽  
Author(s):  
Mathai Joseph ◽  
Abha Moitra ◽  
Neelam Soundararajan

1988 ◽  
Vol 14 (10) ◽  
pp. 1432-1442 ◽  
Author(s):  
F.B. Bastani ◽  
I.-L. Yen ◽  
I.-R. Chen

1993 ◽  
Vol 1 (2) ◽  
pp. 87-103 ◽  
Author(s):  
S Mishra ◽  
L L Peterson ◽  
R D Schlichting

2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Zhengyu Chen ◽  
Jianhua Sun ◽  
Hao Chen

The increasing scale, such as the size and complexity, of computer systems brings more frequent occurrences of hardware or software faults; thus fault-tolerant techniques become an essential component in high-performance computing systems. In order to achieve the goal of tolerating runtime faults, checkpoint restart is a typical and widely used method. However, the exploding sizes of checkpoint files that need to be saved to external storage pose a major scalability challenge, necessitating the design of efficient approaches to reducing the amount of checkpointing data. In this paper, we first motivate the need of redundancy elimination with a detailed analysis of checkpoint data from real scenarios. Based on the analysis, we apply inline data deduplication to achieve the objective of reducing checkpoint size. We use DMTCP, an open-source checkpoint restart package, to validate our method. Our experiment shows that, by using our method, single-computer programs can reduce the size of checkpoint file by 20% and distributed programs can reduce the size of checkpoint file by 47%.


2008 ◽  
Author(s):  
Fuad Abujarad ◽  
Borzoo Bonakdarpour ◽  
Sandeep S. Kulkarni

Sign in / Sign up

Export Citation Format

Share Document