scholarly journals Availability Analysis of Software Systems with Rejuvenation and Checkpointing

Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 846
Author(s):  
Junjun Zheng ◽  
Hiroyuki Okamura ◽  
Tadashi Dohi

In software reliability engineering, software-rejuvenation and -checkpointing techniques are widely used for enhancing system reliability and strengthening data protection. In this paper, a stochastic framework composed of a composite stochastic Petri reward net and its resulting non-Markovian availability model is presented to capture the dynamic behavior of an operational software system in which time-based software rejuvenation and checkpointing are both aperiodically conducted. In particular, apart from the software-aging problem that may cause the system to fail, human-error factors (i.e., a system operator’s misoperations) during checkpointing are also considered. To solve the stationary solution of the non-Markovian availability model, which is derived on the basis of the reachability graph of stochastic Petri reward nets and is actually not one of the trivial stochastic models such as the semi-Markov process and the Markov regenerative process, the phase-expansion approach is considered. In numerical experiments, we illustrate steady-state system availability and find optimal software-rejuvenation policies that maximize steady-state system availability. The effects of human-error factors on both steady-state system availability and the optimal software-rejuvenation trigger timing are also evaluated. Numerical results showed that human errors during checkpointing both decreased system availability and brought a significant effect on the optimal rejuvenation-trigger timing, so that it should not be overlooked during system modeling.

2021 ◽  
Vol 23 (1) ◽  
pp. 195-208
Author(s):  
Varun Kumar ◽  
Girish Kumar ◽  
Rajesh Kumar Singh ◽  
Umang Soni

This paper deals with modeling and analysis of complex mechanical systems that deteriorate with age. As systems age, the questions on their availability and reliability start to surface. The system is believed to suffer from internal stochastic degradation mechanism that is described as a gradual and continuous process of performance deterioration. Therefore, it becomes difficult for maintenance engineer to model such system. Semi-Markov approach is proposed to analyze the degradation of complex mechanical systems. It involves constructing states corresponding to the system functionality status and constructing kernel matrix between the states. The construction of the transition matrix takes the failure rate and repair rate into account. Once the steady-state probability of the embedded Markov chain is computed, one can compute the steady-state solution and finally, the system availability. System models based on perfect repair without opportunistic and with opportunistic maintenance have been developed and the benefits of opportunistic maintenance are quantified in terms of increased system availability. The proposed methodology is demonstrated for a two-stage reciprocating air compressor with intercooler in between, system in series configuration.


2020 ◽  
Vol 37 (6/7) ◽  
pp. 905-923
Author(s):  
Tadashi Dohi ◽  
Hiroyuki Okamura ◽  
Cun Hua Qian

PurposeIn this paper, the authors propose two construction methods to estimate confidence intervals of the time-based optimal software rejuvenation policy and its associated maximum system availability via a parametric bootstrap method. Through simulation experiments the authors investigate their asymptotic behaviors and statistical properties.Design/methodology/approachThe present paper is the first challenge to derive the confidence intervals of the optimal software rejuvenation schedule, which maximizes the system availability in the sense of long run. In other words, the authors concern the statistical software fault management by employing an idea of process control in quality engineering and a parametric bootstrap.FindingsAs a remarkably different point from the existing work, the authors carefully take account of a special case where the two-sided confidence interval of the optimal software rejuvenation time does not exist due to that fact that the estimator distribution of the optimal software rejuvenation time is defective. Here the authors propose two useful construction methods of the two-sided confidence interval: conditional confidence interval and heuristic confidence interval.Research limitations/implicationsAlthough the authors applied a simulation-based bootstrap confidence method in this paper, another re-sampling-based approach can be also applied to the same problem. In addition, the authors just focused on a parametric bootstrap, but a non-parametric bootstrap method can be also applied to the confidence interval estimation of the optimal software rejuvenation time interval, when the complete knowledge on the distribution form is not available.Practical implicationsThe statistical software fault management techniques proposed in this paper are useful to control the system availability of operational software systems, by means of the control chart.Social implicationsThrough the online monitoring in operational software systems, it would be possible to estimate the optimal software rejuvenation time and its associated system availability, without applying any approximation. By implementing this function on application programming interface (API), it is possible to realize the low-cost fault-tolerance for software systems with aging.Originality/valueIn the past literature, almost all authors employed parametric and non-parametric inference techniques to estimate the optimal software rejuvenation time but just focused on the point estimation. This may often lead to the miss-judgment based on over-estimation or under-estimation under uncertainty. The authors overcome the problem by introducing the two-sided confidence interval approach.


Author(s):  
Koichi Tokuno ◽  
Shigeru Yamada

It is important to take into account the trade-off between hardware and software systems when total computer-system reliability/performance are evaluated and assessed. We develop an availability model for a hardware-software system. The system treated here consists of one hardware subsystem and one software subsystem and it is assumed that the system is down and restored whenever a hardware or a software failure occurs. Especially, for the software subsystem, it is supposed that (i) the restoration actions are not always performed perfectly, (ii) the restoration times for later software failures become longer and (iii) reliability growth occurs in the perfect restoration action. The hardware and the software failure-occurrence phenomena are respectively described by constant and geometrically decreasing hazard rates. The time-dependent behavior of the system, which alternately repeats the operational state that a system is operating without failures and the restoration state that a system is inoperable and restored, is described by a Markov process. Useful expressions for several quantitative measures of system performance are derived from this model. Finally, numerical examples are presented for illustration of system availability measurement and assessment.


1966 ◽  
Vol 10 (3) ◽  
pp. 387-398 ◽  
Author(s):  
J.N.R. Grainger ◽  
L. Bass
Keyword(s):  

Mathematics ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 150
Author(s):  
Joanna Akrouche ◽  
Mohamed Sallak ◽  
Eric Châtelet ◽  
Fahed Abdallah ◽  
Hiba Hajj Chehade

Most existing studies of a system’s availability in the presence of epistemic uncertainties assume that the system is binary. In this paper, a new methodology for the estimation of the availability of multi-state systems is developed, taking into consideration epistemic uncertainties. This paper formulates a combined approach, based on continuous Markov chains and interval contraction methods, to address the problem of computing the availability of multi-state systems with imprecise failure and repair rates. The interval constraint propagation method, which we refer to as the forward–backward propagation (FBP) contraction method, allows us to contract the probability intervals, keeping all the values that may be consistent with the set of constraints. This methodology is guaranteed, and several numerical examples of systems with complex architectures are studied.


Author(s):  
Shruthi P. ◽  
Nagaraj G. Cholli

Cloud Computing is the environment in which several virtual machines (VM) run concurrently on physical machines. The cloud computing infrastructure hosts multiple cloud service segments that communicate with each other using the interfaces. This creates distributed computing environment. During operation, the software systems accumulate errors or garbage that leads to system failure and other hazardous consequences. This status is called software aging. Software aging happens because of memory fragmentation, resource consumption in large scale and accumulation of numerical error. Software aging degrads the performance that may result in system failure. This happens because of premature resource exhaustion. This issue cannot be determined during software testing phase because of the dynamic nature of operation. The errors that cause software aging are of special types. These errors do not disturb the software functionality but target the response time and its environment. This issue is to be resolved only during run time as it occurs because of the dynamic nature of the problem. To alleviate the impact of software aging, software rejuvenation technique is being used. Rejuvenation process reboots the system or re-initiates the softwares. This avoids faults or failure. Software rejuvenation removes accumulated error conditions, frees up deadlocks and defragments operating system resources like memory. Hence, it avoids future failures of system that may happen due to software aging. As service availability is crucial, software rejuvenation is to be carried out at defined schedules without disrupting the service. The presence of Software rejuvenation techniques can make software systems more trustworthy. Software designers are using this concept to improve the quality and reliability of the software. Software aging and rejuvenation has generated a lot of research interest in recent years. This work reviews some of the research works related to detection of software aging and identifies research gaps.


2007 ◽  
Vol 68 (16-18) ◽  
pp. 2313-2319 ◽  
Author(s):  
C.J. Baxter ◽  
J.L. Liu ◽  
A.R. Fernie ◽  
L.J. Sweetlove

Author(s):  
Khalid Alnowibet ◽  
Lotfi Tadj

The service system considered in this chapter is characterized by an unreliable server. Random breakdowns occur on the server and the repair may not be immediate. The authors assume the possibility that the server may take a vacation at the end of a given service completion. The server resumes operation according to T-policy to check if enough customers have arrived while he was away. The actual service of any arrival takes place in two consecutive phases. Both service phases are independent of each other. A Markov chain approach is used to obtain the steady state system size probabilities and different performance measures. The optimal value of the threshold level is obtained analytically.


Sign in / Sign up

Export Citation Format

Share Document