rollback recovery
Recently Published Documents


TOTAL DOCUMENTS

137
(FIVE YEARS 8)

H-INDEX

18
(FIVE YEARS 0)

2021 ◽  
Vol 3 (3) ◽  
pp. 135-148
Author(s):  
Nayana Shetty

For the purpose of high performance computation, several machines are developed at an exascale level. These machines can perform at least one exaflop calculations per second, which corresponds to a billion billon or 108. The universe and nature can be understood in a better manner while addressing certain challenging computational issues by using these machines. However, certain obstacles are faced by these machines. As huge quantity of components is encompassed in the exascale machines, frequent failure may be experienced and also the resilience may be challenging. High progress rate must be maintained for the applications by incorporating certain form of fault tolerance in the system. Power management has to be performed by incorporating the system in a parallel manner. All layers inclusive of fault tolerance layer must adhere to the power limitation in the system. Huge energy bills may be expected on installation of exascale machines due to the high power consumption. For various fault tolerance models, the energy profile must be analyzed. Parallel recovery, message-logging, and restart or checkpoint fault tolerance models for rollback recovery are evaluated in this paper. For execution with failure, the most energy efficient solution is provided by parallel recovery when programs with various programming models are used. The execution is performed faster with parallel recovery when compared to the other techniques. An analytical model is used for exploring these models and their behavior at extreme scales.


2021 ◽  
Vol 11 (1) ◽  
pp. 228-230
Author(s):  
K. Sathya Sundari

In industries, the completion time of job problems in the manufacturing unit has risen significantly. In several types of current study, the job's completion time, or makespan, is reduced by taking straight paths, which is time-consuming. In this paper, we used an Improved Ant Colony Optimization and Tabu Search (ACOTS) algorithm to solve this problem by precisely defining the fault occurrence location in order to rollback. We have used a short-term memory-based rollback recovery strategy to minimise the job's completion time by rolling back to its own short-term memory. The recent movements in Tabu quest are visited using short term memory. As compared to the ACO algorithm, our proposed ACOTS-Cmax solution is more efficient and takes less time to complete.


In industries, the completion time of job problems is increased drastically in the production unit. In many existing kinds of research, the completion time i.e. makespan of the job is minimized using straight paths which is time-consuming. In this paper, we addressed this problem using an Improved Ant Colony Optimization and Tabu Search (ACOTS) algorithm by identifying the fault occurrence position exactly to rollback. Also, we used a short term memory-based rollback recovery technique to roll back to its own short term memory to reduce the completion time of the job. Short term memory is used to visit the recent movements in Tabu search. Our proposed ACOTS-Cmax approach is efficient and consumed less completion time compared to the ACO algorithm


2018 ◽  
Vol 15 (2) ◽  
pp. 74
Author(s):  
Junianto Sesa

AbstractFault tolerance approach is the most popular computing application on computer devices in which depends on checkpoint uncoordinated. This alternative approach is based on checkpoint uncoordinated and logging message requiring all records, imposing works, memories and overhead becomes significant to communication. Recent studies have found that many applications on computer are send-determinism which can possibly design a new fault tolerance protocol. Thus, this research uses checkpoint uncoordinated protocol based causality strength, a send-determinism feature to record one part of the messages without restarting the process systematically when the error occurs. By drawing the protocol and proving its validity are required as the effective methods of this research. With this alternative approach, the protocol can functionally work where the only small portion of the message is recorded and domino effect does not occur.Keywords : Causality Strength, Domino Effect, Rollback Recovery, Uncoordinated Checkpointing  AbstrakPendekatan toleransi kesalahan yang paling populer untuk aplikasi komputasi pada perangkat komputer bergantung pada checkpoint uncoordinated. Alternatif pendekatan tersebut berdasarkan pada checkpoint uncoordinated dan logging pesan mengharuskan pencatatan semua pesan, memaksakan pekerjaan memori/penyimpanan tinggi dan overhead yang signifikan pada komunikasi. Baru-baru ini telah diamati bahwa banyak aplikasi pada komputer bersifat send-determinism yang memungkinkan untuk mendesain protokol toleransi kesalahan baru. Sehingga penelitian ini menggunakan protokol checkpoint uncoordinated berbasis causality strength yang bersifat send-determinism yang hanya mencatat satu bagian dari pesan dan tidak perlu me-restart secara sistematis semua proses ketika kegagalan terjadi. Untuk menunjukkan bahwa penelitian ini berjalan sesuai dengan metode yang digunakan yaitu dengan menggambarkan protokol dan membuktikan kebenarannya. Dengan menggunakan pendekatan tersebut, dapat ditunjukkan bahwa protokol ini benar-benar berhasil dimana hanya mencatat sebagian kecil dari pesan dan tidak terjadi efek domino.Kata kunci : Causality Strength, Efek Domino, Rollback Recovery, Uncoordinated Checkpointing


2018 ◽  
Vol 15 (2) ◽  
pp. 71
Author(s):  
Junianto Sesa
Keyword(s):  

Pendekatan toleransi kesalahan yang paling populer untuk aplikasi komputasi pada perangkat komputer bergantung pada checkpoint uncoordinated. Alternatif pendekatan tersebut berdasarkan pada checkpoint uncoordinated dan logging pesan mengharuskan pencatatan semua pesan, memaksakan pekerjaan memori/penyimpanan tinggi dan overhead yang signifikan pada komunikasi. Baru-baru ini telah diamati bahwa banyak aplikasi pada komputer bersifat send-determinism yang memungkinkan untuk mendesain protokol toleransi kesalahan baru. Sehingga penelitian ini menggunakan protokol checkpoint uncoordinated berbasis causality strength yang bersifat send-determinism yang hanya mencatat satu bagian dari pesan dan tidak perlu me-restart secara sistematis semua proses ketika kegagalan terjadi. Untuk menunjukkan bahwa penelitian ini berjalan sesuai dengan metode yang digunakan yaitu dengan menggambarkan protokol dan membuktikan kebenarannya. Dengan menggunakan pendekatan tersebut, dapat ditunjukkan bahwa protokol ini benar-benar berhasil dimana hanya mencatat sebagian kecil dari pesan dan tidak terjadi efek domino.


Sign in / Sign up

Export Citation Format

Share Document