Comparing Checkpoint and Rollback Recovery Schemes in a Cluster System

Author(s):  
Noriaki Bessho ◽  
Tadashi Dohi
Computer ◽  
1993 ◽  
Vol 26 (2) ◽  
pp. 22-31 ◽  
Author(s):  
N.S. Bowen ◽  
D.K. Pradham

Author(s):  
A Chien ◽  
P Balaji ◽  
N Dun ◽  
A Fang ◽  
H Fujita ◽  
...  

Exascale studies project reliability challenges for future HPC systems. We present the Global View Resilience (GVR) system, a library for portable resilience. GVR begins with a subset of the Global Arrays interface, and adds new capabilities to create versions, name versions, and compute on version data. Applications can focus versioning where and when it is most productive, and customize for each application structure independently. This control is portable, and its embedding in application source makes it natural to express and easy to maintain. The ability to name multiple versions and “partially materialize” them efficiently makes ambitious forward-recovery based on “data slices” across versions or data structures both easy to express and efficient. Using several large applications (OpenMC, preconditioned conjugate gradient (PCG) solver, ddcMD, and Chombo), we evaluate the programming effort to add resilience. The required changes are small (< 2% lines of code (LOC)), localized and machine-independent, and perhaps most important, require no software architecture changes. We also measure the overhead of adding GVR versioning and show that overheads < 2% are generally achieved. This overhead suggests that GVR can be implemented in large-scale codes and support portable error recovery with modest investment and runtime impact. Our results are drawn from both IBM BG/Q and Cray XC30 experiments, demonstrating portability. We also present two case studies of flexible error recovery, illustrating how GVR can be used for multi-version rollback recovery, and several different forward-recovery schemes. GVR’s multi-version enables applications to survive latent errors (silent data corruption) with significant detection latency, and forward recovery can make that recovery extremely efficient. Our results suggest that GVR is scalable, portable, and efficient. GVR interfaces are flexible, supporting a variety of recovery schemes, and altogether GVR embodies a gentle-slope path to tolerate growing error rates in future extreme-scale systems.


2018 ◽  
Vol 55 (4) ◽  
pp. 652-657 ◽  
Author(s):  
Gabriel Murariu ◽  
Razvan Adrian Mahu ◽  
Adrian Gabriel Murariu ◽  
Mihai Daniel Dragu ◽  
Lucian P. Georgescu ◽  
...  

This article presents the design of a specific unmanned aerial vehicle UAV prototype own building. Our UAV is a flying wing type and is able to take off with a little boost. This system happily combines some major advantages taken from planes namely the ability to fly horizontal, at a constant altitude and of course, the great advantage of a long flight-time. The aerodynamic models presented in this paper are optimized to improve the operational performance of this aerial vehicle, especially in terms of stability and the possibility of a long gliding flight-time. Both aspects are very important for the increasing of the goals� efficiency and for the getting work jobs. The presented simulations were obtained using ANSYS 13 installed on our university� cluster system. In a next step the numerical results will be compared with those during experimental flights. This paper presents the main results obtained from numerical simulations and the obtained magnitudes of the main flight coefficients.


1999 ◽  
Vol 513 (2) ◽  
pp. 733-751 ◽  
Author(s):  
Arunav Kundu ◽  
Bradley C. Whitmore ◽  
William B. Sparks ◽  
F. Duccio Macchetto ◽  
Stephen E. Zepf ◽  
...  

SIMULATION ◽  
2010 ◽  
Vol 87 (12) ◽  
pp. 1021-1031 ◽  
Author(s):  
Zafeirios C Papazachos ◽  
Helen D Karatza

Sign in / Sign up

Export Citation Format

Share Document