Reliability Prediction of Computing Network with Software and Hardware Failures

Author(s):  
Chandra Shekhar ◽  
Neeraj Kumar ◽  
Madhu Jain ◽  
Amit Gupta

In this paper, we investigate the reliability and queueing performance indices for the fault-tolerant computing network having a finite number of unreliable operating components with the provision of warm standby components. Operating and standby components are governed by dedicated software which is also prone to random failure. On failure of operating components, available standby component(s) may switch from the standby state to operating state with negligible switchover time. The switchover process may also fail due to some automation hindrance. The computing network is also subjected to common cause failure in lieu of external cause. The studied redundant fault-tolerant computing network is framed as a Markovian machine interference model with exponentially distributed inter-failure times and service times. For the reliability prediction of the computing network, various performance measures, namely, mean-time-to-failure (MTTF), reliability/availability, failure frequency, etc., have been formulated in terms of transient-state probabilities which we have obtained using the spectral method. To show the practicability of the developed model, numerical simulation has been done. Sensitivity analysis of reliability and other indices of the computing network with respect to different network parameters has been presented, and results are summarized in the tables and graphs. Finally, future scope and concluding remarks have been included.

2020 ◽  
Vol 37 (6/7) ◽  
pp. 983-1005
Author(s):  
Chandra Shekhar ◽  
Amit Gupta ◽  
Madhu Jain ◽  
Neeraj Kumar

PurposeThe purpose of this paper is to present a sensitivity analysis of fault-tolerant redundant repairable computing systems with imperfect coverage, reboot and recovery process.Design/methodology/approachIn this investigation, the authors consider the computing system having a finite number of identical working units functioning simultaneously with the provision of standby units. Working and standby units are prone to random failure in nature and are administered by unreliable software, which is also likely to unpredictable failure. The redundant repairable computing system is modeled as a Markovian machine interference problem with exponentially distributed failure rates and service rates. To excerpt the failed unit from the computing system, the system either opts randomized reboot process or leads to recovery delay.FindingsTransient-state probabilities have been determined with which the authors develop various reliability measures, namely reliability/availability, mean time to failure, failure frequency, and so on, and queueing characteristics, namely expected number of failed units, the throughput of the system and so on, for the predictive purpose. To spectacle the practicability of the developed model, a numerical simulation, sensitivity analysis and so on for different parameters have also been done, and the results are summarized in the tables and graphs. The transient results are helpful to analyze the developing model of the system before having the stability of the system. The derived measures give direct insights into parametric decision-making.Social implicationsThe conclusion has been drawn, and future scope is remarked. The present research study would help system analyst and system designer to make a better choice/decision in order to have the economical design and strategy based on the desired mean time to failure, reliability/availability of the systems and other queueing characteristics.Originality/valueDifferent from previous investigations, this studied model provides a more accurate assessment of the computing system compared to uncertain environments based on sensitivity analysis.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Amit Kumar ◽  
Pardeep Kumar

PurposeThis paper presents the performance analysis of the automatic ticket vending machine (ATVM) through the functioning of its different hardware and software failures.Design/methodology/approachFrequent failures in the working of ATVM have been observed; therefore, the authors of the paper intend to analyze the performance measures of the same. Authors have developed a mathematical model based on different hardware and software failures/repairs, which may occur during the operation, with the help of the Markov process. The developed model has been solved for two kinds of failure/repair rates namely variable failures (very much similar to real-time failure) and constant failures. Lagrange's method and Laplace transformation are used for the solution of the developed model.FindingsReliability and mean time to failure of the ATVM are determined. Sensitivity analysis for ATVM is also carried out in the paper. Critical components of the ATVM, which affect the performance of the same, in terms of reliability and MTTF are also identified.Originality/valueA mathematical model based on different hardware and software failures/repairs of ATVM has been developed to analyze its performance, which has not been done in the past.


Energies ◽  
2020 ◽  
Vol 13 (24) ◽  
pp. 6525
Author(s):  
Hugues Renaudineau ◽  
Pol Paradell-Solà ◽  
Lluís Trilla ◽  
Alber Filba-Martinez ◽  
David Cardoner ◽  
...  

In photovoltaic (PV) systems, the reliability of the system components, especially the power converters, is a major concern in obtaining cost effective solutions. In order to guarantee service continuity in the case of failure of elements of the PV converter, in particular, semiconductor switching devices, a solution is to design power converter with fault-tolerance capability. This can be realized by aggregating hardware redundancy on an existing converter, providing the possibility of replacement of faulty elements. This paper evaluates the reliability of a fault-tolerant power electronics converter for PV multistring application. The considered fault-tolerant design includes a single redundant switching leg, which is used in order to reconfigure the structure in case of a switch failure either on DC-AC or DC-DC stages. This paper details the reliability estimation of the considered PV multistring fault-tolerant converter. Furthermore, a comparison with a conventional structure without fault-tolerant capability is provided. The results show that the introduction of a single redundant leg allows for improving the converter mean time to failure by a factor of almost two and it reduces, by half, the power loss due to system-failure shutdowns in PV applications, while only increasing the converter cost by 2–3%.


Author(s):  
Daniel Scheit ◽  
Heinrich Theodor Vierhaus

The reliability of interconnects on ICs has become a major problem in recent years, due to the rise of complexity, low-k-insulating material with reduced stability, and wear-out-effects due to high current density. The total reliability of a system on a chip is more and more dependent on the reliability of interconnects. The growing volume of communication due to the increasing number of integrated functional units is the main reason. Articles have been published, which predict that static faults due to wear-out effects will occur more often. This will harm the reliability and decrease the mean-time-to-failure. Most of the published techniques are aimed at the correction of transient faults. Built-in self-repair has not been discussed as much as the other techniques. In this chapter, the authors will provide an overview over the state of the art for fault-tolerant interconnects. They will discuss the use of built-in self repair in combination with other approved solutions. The combination is a promising way to deal with all kinds of faults.


2007 ◽  
Vol 556-557 ◽  
pp. 675-678 ◽  
Author(s):  
Kevin Matocha ◽  
Richard Beaupre

Thermal oxides on 4H-SiC are characterized using time-dependent dielectric breakdown techniques at electric fields between 6 and 10 MV/cm. At 250°C, oxides thermally-grown using N2O with NO annealing achieve a mean time to failure (MTTF) of 2300 hours at 6 MV/cm. Oxides grown in steam with NO annealing show approximately four times longer MTTF than N2O-grown oxides. At electric fields greater than 8 MV/cm, Fowler-Nordheim tunneling significantly reduces the expected failure times. For this reason, extrapolation of mean-time to failure at low fields must be performed by datapoints measured at lower electric fields.


Author(s):  
Anas Sani Maihulla ◽  
Ibrahim Yusuf

The primary aim of this present study is to examine how reliability, availability, maintainability, and dependability (RAMD) are used to describe the criticality of each sub-assembly in grid- connected photovoltaic systems. A transition diagram of all subsystems is produced for this analysis, and Chapman-Kolmogorov differential equations for each variable of each subsystem are constructed using the Markov birth-death process. Both random failure and repair time variables have an exponential distribution and are statistically independent. A sufficient repair facility is still available with the device. The numerical results for reliability, maintainability, dependability, and steady-state availability for various photovoltaic device components have been obtained. Other metrics, such as mean time to failure (MTTF), mean time to repair (MTTR), and dependability ratio, which aid in device performance prediction, have also been measured. According to numerical analysis. it is hypothesized that subsystem S4, i.e. the inverter, is the most critical and highly sensitive portion that requires special attention in order to improve the efficiency of the PV device plant. The findings of this research are very useful for photovoltaic system designers and maintenance engineers.


1971 ◽  
Vol 3 (02) ◽  
pp. 229-248 ◽  
Author(s):  
David S. Reynolds ◽  
I. Richard Savage

Gaver (1963) and Antelman and Savage (1965) have proposed models for the distribution of the time to failure of a simple device exposed to a randomly varying environment. Each model represents cumulative wear as a specified function of a non-negative stochastic process with independent increments, and assumes that the reliability of the device is conditioned upon realizations of this process. From these models are derived the corresponding unconditional joint distributions for the random failure time vector of n independent, identical devices exposed to the same realization of the wear process. It is shown that the identical failure time distribution for one component can arise from each model. In the Gaver model simultaneous failure times occur with positive probability. The probabilities of specific tie configurations are developed. For an interesting class of Gaver models involving a time scale parameter, the maximum likelihood estimates from several devices in one environment are examined. In that case the tie configuration probability does not depend on the parameter. For the corresponding Antelman-Savage models a consistent sequence of estimators is obtained; the maximum likelihood theory did not appear tractable.


1984 ◽  
Vol 11 (2) ◽  
pp. 185-190 ◽  
Author(s):  
R. B. Pranchov ◽  
D. S. Campbell

A model for time-to-failure prediction based on component parameter drift is described. The idea for creation of this model is based on the influence of time-dependent random and non random factors on the distribution of the random variable.The reliability interpretation of the data from thick film resistor ageing tests has been completed with the model developed. Two different drift functions are described for two ruthenium-based thick film methods with equal resistivity – 10Ω/square. The values of the functional parameters depend strongly on the storage test temperature. Mean time-to-failure decreases approximately 4 times with an increase of storage temperature of 20 deg. C.


1971 ◽  
Vol 3 (2) ◽  
pp. 229-248 ◽  
Author(s):  
David S. Reynolds ◽  
I. Richard Savage

Gaver (1963) and Antelman and Savage (1965) have proposed models for the distribution of the time to failure of a simple device exposed to a randomly varying environment. Each model represents cumulative wear as a specified function of a non-negative stochastic process with independent increments, and assumes that the reliability of the device is conditioned upon realizations of this process. From these models are derived the corresponding unconditional joint distributions for the random failure time vector of n independent, identical devices exposed to the same realization of the wear process. It is shown that the identical failure time distribution for one component can arise from each model. In the Gaver model simultaneous failure times occur with positive probability. The probabilities of specific tie configurations are developed.For an interesting class of Gaver models involving a time scale parameter, the maximum likelihood estimates from several devices in one environment are examined. In that case the tie configuration probability does not depend on the parameter. For the corresponding Antelman-Savage models a consistent sequence of estimators is obtained; the maximum likelihood theory did not appear tractable.


Sign in / Sign up

Export Citation Format

Share Document