Iterative Selection of Categorical Variables for Log Data Anomaly Detection

2021 ◽  
pp. 757-777
Author(s):  
Max Landauer ◽  
Georg Höld ◽  
Markus Wurzenberger ◽  
Florian Skopik ◽  
Andreas Rauber
2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Bingming Wang ◽  
Shi Ying ◽  
Zhe Yang

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.


1996 ◽  
Vol 8 (3) ◽  
pp. 133-144 ◽  
Author(s):  
María del Mar del Pozo Andrés ◽  
Jacques F A Braster

In this article we propose two research techniques that can bridge the gap between quantitative and qualitative historical research. These are: (1) a multiple regression approach that gives information about general patterns between numerical variables and the selection of outliers for qualitative analysis; (2) a homogeneity analysis with alternating least squares that results in a two-dimensional picture in which the relationships between categorical variables are graphically presented.


2020 ◽  
Author(s):  
Bo Zhang ◽  
Hongyu Zhang ◽  
Pablo Moscato

<div>Complex software intensive systems, especially distributed systems, generate logs for troubleshooting. The logs are text messages recording system events, which can help engineers determine the system's runtime status. This paper proposes a novel approach named ADR (stands for Anomaly Detection by workflow Relations) that employs matrix nullspace to mine numerical relations from log data. The mined relations can be used for both offline and online anomaly detection and facilitate fault diagnosis. We have evaluated ADR on log data collected from two distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM Blue Gene/L supercomputers system). ADR successfully mined 87 and 669 numerical relations from the logs and used them to detect anomalies with high precision and recall. For online anomaly detection, ADR employs PSO (Particle Swarm Optimization) to find the optimal sliding windows' size and achieves fast anomaly detection.</div><div>The experimental results confirm that ADR is effective for both offline and online anomaly detection. </div>


2021 ◽  
pp. 1-15
Author(s):  
Savaridassan Pankajashan ◽  
G. Maragatham ◽  
T. Kirthiga Devi

Anomaly-based detection is coupled with recognizing the uncommon, to catch the unusual activity, and to find the strange action behind that activity. Anomaly-based detection has a wide scope of critical applications, from bank application security to regular sciences to medical systems to marketing apps. Anomaly-based detection adopted by various Machine Learning techniques is really a type of system that consists of artificial intelligence. With the ever-expanding volume and new sorts of information, for example, sensor information from an incontestably enormous amount of IoT devices and from network flow data from cloud computing, it is implicitly understood without surprise that there is a developing enthusiasm for having the option to deal with more conclusions automatically by means of AI and ML applications. But with respect to anomaly detection, many applications of the scheme are simply the passion for detection. In this paper, Machine Learning (ML) techniques, namely the SVM, Isolation forest classifiers experimented and with reference to Deep Learning (DL) techniques, the proposed DA-LSTM (Deep Auto-Encoder LSTM) model are adopted for preprocessing of log data and anomaly-based detection to get better performance measures of detection. An enhanced LSTM (long-short-term memory) model, optimizing for the suitable parameter using a genetic algorithm (GA), is utilized to recognize better the anomaly from the log data that is filtered, adopting a Deep Auto-Encoder (DA). The Deep Neural network models are utilized to change over unstructured log information to training ready features, which are reasonable for log classification in detecting anomalies. These models are assessed, utilizing two benchmark datasets, the Openstack logs, and CIDDS-001 intrusion detection OpenStack server dataset. The outcomes acquired show that the DA-LSTM model performs better than other notable ML techniques. We further investigated the performance metrics of the ML and DL models through the well-known indicator measurements, specifically, the F-measure, Accuracy, Recall, and Precision. The exploratory conclusion shows that the Isolation Forest, and Support vector machine classifiers perform roughly 81%and 79%accuracy with respect to the performance metrics measurement on the CIDDS-001 OpenStack server dataset while the proposed DA-LSTM classifier performs around 99.1%of improved accuracy than the familiar ML algorithms. Further, the DA-LSTM outcomes on the OpenStack log data-sets show better anomaly detection compared with other notable machine learning models.


2019 ◽  
Author(s):  
K. Struminskiy ◽  
A. Klenitskiy ◽  
A. Reshytko ◽  
D. Egorov ◽  
A. Shchepetnov ◽  
...  

2016 ◽  
Vol 5 (6) ◽  
pp. 283-288
Author(s):  
Siwoon Son ◽  
Myeong-Seon Gil ◽  
Yang-Sae Moon ◽  
Hee-Sun Won

2019 ◽  
Vol 2 ◽  
Author(s):  
Dimitar Plachiyski ◽  
Georgi Popgeorgiev ◽  
Stefan Avramov ◽  
Yurii Kornilev

Current habitat management of the peripheral, regionally unique, and isolated Balkan capercaillie Tetrao urogallus rudolfi Dombrowski, 1912 meta-population in Bulgaria is based on obsolete knowledge of the spatial requirements of the species. Thus, we studied the habitat availability and the patterns of use by Capercaillie adult males, at the home range scale to inform and contribute to the conservation-oriented management of the threatened subspecies and its habitats. The field study was conducted during 2014–2015 in the northeastern part of Rila Mtn., Southwestern Bulgaria. Using GPS tags (“Bird 2A”, e-obs Digital Telemetry, Grünwald, Germany), a total of 38,640 GPS fixes from 3 displaying males, associated with one lek were gained. On this basis, we calculated annual and seasonal Minimum Convex Polygons (MCP), traditionally used as a measure of the maximum area of activity. Capercaillie habitat preference was computed using Manly’s habitat selection ratios (w), design III, combined with 90% Bonferroni simultaneous confidence intervals. To calculate habitat selection, we determined surface (Steepness and Exposure), forest stand succession and vegetation cover categorical variables. The habitat and surface layers was rasterized into 8 m square pixels. At the home range (MCP) scale, tagged roosters used vegetation cover non-randomly (annual: Khi2L=5738.89, df=14, p&lt;0.001; winter: Khi2L=3773.28, df=13, p&lt;0.001; summer: Khi2L=3646.32, df=14, p&lt;0.001), and preferred forests dominated by Scots pine and Macedonian pine, such as the annual selection of Scots pine and summer selection of Macedonian pine are significantly different. In terms of forest stage succession, roosters used forest stages non-randomly (annual: Khi2L=3492.57, df=8, p&lt;0.001; winter: Khi2L=2075.18, df=8, p&lt;0.001; summer: Khi2L=1670.1, df=6, p&lt;0.001), and demonstrated clear avoidance of forests stands in age classes: “0 to 40” and “41 to 80” years within the summer and annual ranges. The roosters demonstrated significant preference for southeastern exposure during the winter and annually, and significant overall avoidance of northern exposure, as well as avoidance of north-eastern aspect during the winter and south aspect during the summer (annual: Khi2L=4671.87, df=18, p&lt;0.001; winter: Khi2L=3909.04, df=16, p&lt;0.001; summer: hi2L=3095.84, df=18, p&lt;0.001). The slope class “63.1 to 73o” was not used. In the summer, Capercaillie males significantly preferred slopes within the class “27.1 to 36o” and avoided the classes “0 to 9o”, “9.1 to 18o” and “54.1 to 63o”. The birds also demonstrated significant avoidance of flat terrains within the “0 to 9o” class annually (annual: Khi2L=608.24, df=17, p&lt;0.001; winter: Khi2L=1148.37, df=16, p&lt;0.001; summer: Khi2L=906.54, df=17, p&lt;0.001).


Author(s):  
Harold Ott ◽  
Jasmin Bogatinovski ◽  
Alexander Acker ◽  
Sasho Nedelkoski ◽  
Odej Kao

Sign in / Sign up

Export Citation Format

Share Document