scholarly journals Incident Management for Explainable and Automated Root Cause Analysis in Cloud Data Centers    

2021 ◽  
Vol 27 (11) ◽  
pp. 1152-1173
Author(s):  
Arnak Poghosyan ◽  
Ashot Harutyunyan ◽  
Naira Grigoryan ◽  
Nicholas Kushmerick

Effective root cause analysis (RCA) of performance issues in modern cloud environ- ments remains a hard problem. Traditional RCA tracks complex issues by their signatures known as problem incidents. Common approaches to incident discovery rely mainly on expertise of users who define environment-specific set of alerts and >target detection of problems through their occurrence in the monitoring system. Adequately modeling of all possible problem patterns for nowadays extremely sophisticated data center applications is a very complex task. It may result in alert/event storms including large numbers of non-indicative precautions. Thus, the crucial task for the incident-based RCA is reduction of redundant recommendations by prioritizing those events subject to importance/impact criteria or by deriving their meaningful groupings into separable situations. In this paper, we consider automation of incident discovery based on rule induction algorithms that retrieve conditions directly from monitoring datasets without consuming the sys- tem events. Rule-learning algorithms are very flexible and powerful for many regression and classification problems, with high-level explainability. Since annotated or labeled data sets are mostly unavailable in this area of technology, we discuss data self-labelling principles which allow transforming originally unsupervised learning tasks into classification problems with further application of rule induction methods to incident detection.

2019 ◽  
Vol 2 (2) ◽  
pp. 133-143
Author(s):  
Mega Astuti DR ◽  
Uwes Anis Chaeruman ◽  
Mulyadi

This study aims to find out the root cause of the decreasing human performance and provide the right interventions to solve the problem in project payment sub division at PT. Sedayu Utama. The research method used is descriptive analysis with qualitative approach, data is obtained through interview, questionnaire and observation. As the results of this study of root cause analysis, the inability to use electronic equipment, to accept constructive criticism due to high level of superiority and lack of initiative thinking are found. Meanwhile from equipment factor such as computer and scanner, it is found to be underutilized and from management factor, it is found that there’s a lack of commitment, competition, coordination and understanding of the company’s procedure, from motivation factor, it is found that there’s a lack of supervision, reward & punishment and lack of value inconsistent with mission. The solutions advised to solve these problems are coaching, knowledge management, sharing session, on the job training and reward & punishment.


2018 ◽  
Vol 18 (4) ◽  
pp. 60-72 ◽  
Author(s):  
Tobias MUELLER ◽  
Jonathan GREIPEL ◽  
Tobias WEBER ◽  
Robert H. SCHMITT

To detect root causes of non-conforming parts - parts outside the tolerance limits - in production processes a high level of expert knowledge is necessary. This results in high costs and a low flexibility in the choice of personnel to perform analyses. In modern production a vast amount of process data is available and machine learning algorithms exist which model processes empirically. Aim of this paper is to introduce a procedure for an automated root cause analysis based on machine learning algorithms to reduce the costs and the necessary expert knowledge. Therefore, a decision tree algorithm is chosen. A procedure for its application in an automated root cause analysis is presented and simulations to prove its applicability are conducted. In this paper influences affecting the success of detection are identified and simulated e.g. the necessary amount of data dependent on the amount of variables, the ratio between categories of non-conformities and OK parts as well as detectable root causes. The simulations are based on a regression model to determine the roughness of drilling holes. They prove the applicability of machine learning algorithms for an automated root cause analysis and indicate which influences have to be considered in real scenarios.


2011 ◽  
pp. 78-86
Author(s):  
R. Kilian ◽  
J. Beck ◽  
H. Lang ◽  
V. Schneider ◽  
T. Schönherr ◽  
...  

2012 ◽  
Vol 132 (10) ◽  
pp. 1689-1697
Author(s):  
Yutaka Kudo ◽  
Tomohiro Morimura ◽  
Kiminori Sugauchi ◽  
Tetsuya Masuishi ◽  
Norihisa Komoda

Author(s):  
Dan Bodoh ◽  
Kent Erington ◽  
Kris Dickson ◽  
George Lange ◽  
Carey Wu ◽  
...  

Abstract Laser-assisted device alteration (LADA) is an established technique used to identify critical speed paths in integrated circuits. LADA can reveal the physical location of a speed path, but not the timing of the speed path. This paper describes the root cause analysis benefits of 1064nm time resolved LADA (TR-LADA) with a picosecond laser. It shows several examples of how picosecond TR-LADA has complemented the existing fault isolation toolset and has allowed for quicker resolution of design and manufacturing issues. The paper explains how TR-LADA increases the LADA localization resolution by eliminating the well interaction, provides the timing of the event detected by LADA, indicates the propagation direction of the critical signals detected by LADA, allows the analyst to infer the logic values of the critical signals, and separates multiple interactions occurring at the same site for better understanding of the critical signals.


Sign in / Sign up

Export Citation Format

Share Document