scholarly journals COMPARATIVE ANALYSIS OF SYSTEM LOGS AND STREAMING DATA ANOMALY DETECTION ALGORITHMS

Author(s):  
Andriy Lishchytovych ◽  
Volodymyr Pavlenko ◽  
Alexander Shmatok ◽  
Yuriy Finenko

This paper provides with the description, comparative analysis of multiple commonly used approaches of the analysis of system logs, and streaming data massively generated by company IT infrastructure with an unattended anomaly detection feature. An importance of the anomaly detection is dictated by the growing costs of system downtime due to the events that would have been predicted based on the log entries with the abnormal data reported. Anomaly detection systems are built using standard workflow of the data collection, parsing, information extraction and detection steps. Most of the document is related to the anomaly detection step and algorithms like regression, decision tree, SVM, clustering, principal components analysis, invariants mining and hierarchical temporal memory model. Model-based anomaly algorithms and hierarchical temporary memory algorithms were used to process HDFS, BGL and NAB datasets with ~16m log messages and 365k data points of the streaming data. The data was manually labeled to enable the training of the models and accuracy calculation. According to the results, supervised anomaly detection systems achieve high precision but require significant training effort, while HTM-based algorithm shows the highest detection precision with zero training. Detection of the abnormal system behavior plays an important role in large-scale incident management systems. Timely detection allows IT administrators to quickly identify issues and resolve them immediately. This approach reduces the system downtime dramatically.Most of the IT systems generate logs with the detailed information of the operations. Therefore, the logs become an ideal data source of the anomaly detection solutions. The volume of the logs makes it impossible to analyze them manually and requires automated approaches.

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Chunbo Liu ◽  
Lanlan Pan ◽  
Zhaojun Gu ◽  
Jialiang Wang ◽  
Yitong Ren ◽  
...  

System logs can record the system status and important events during system operation in detail. Detecting anomalies in the system logs is a common method for modern large-scale distributed systems. Yet threshold-based classification models used for anomaly detection output only two values: normal or abnormal, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is adopted to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the probability distribution of labels for a set of samples and provide a quality assessment of predictive labels to some extent. Two Venn-Abers predictors LR-VA and SVM-VA have been implemented based on Logistic Regression and Support Vector Machine, respectively. Then, the differences among different algorithms are considered so as to build a multimodel fusion algorithm by Stacking. And then a Venn-Abers predictor based on the Stacking algorithm called Stacking-VA is implemented. The performances of four types of algorithms (unimodel, Venn-Abers predictor based on unimodel, multimodel, and Venn-Abers predictor based on multimodel) are compared in terms of validity and accuracy. Experiments are carried out on a log dataset of the Hadoop Distributed File System (HDFS). For the comparative experiments on unimodels, the results show that the validities of LR-VA and SVM-VA are better than those of the two corresponding underlying models. Compared with the underlying model, the accuracy of the SVM-VA predictor is better than that of LR-VA predictor, and more significantly, the recall rate increases from 81% to 94%. In the case of experiments on multiple models, the algorithm based on Stacking multimodel fusion is significantly superior to the underlying classifier. The average accuracy of Stacking-VA is larger than 0.95, which is more stable than the prediction results of LR-VA and SVM-VA. Experimental results show that the Venn-Abers predictor is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.


Author(s):  
Mohammad Rasool Fatemi ◽  
Ali A. Ghorbani

System logs are one of the most important sources of information for anomaly and intrusion detection systems. In a general log-based anomaly detection system, network, devices, and host logs are all collected and used together for analysis and the detection of anomalies. However, the ever-increasing volume of logs remains as one of the main challenges that anomaly detection tools face. Based on Sysmon, this chapter proposes a host-based log analysis system that detects anomalies without using network logs to reduce the volume and to show the importance of host-based logs. The authors implement a Sysmon parser to parse and extract features from the logs and use them to perform detection methods on the data. The valuable information is successfully retained after two extensive volume reduction steps. An anomaly detection system is proposed and performed on five different datasets with up to 55,000 events which detects the attacks using the preserved logs. The analysis results demonstrate the significance of host-based logs in auditing, security monitoring, and intrusion detection systems.


2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
Mikel Iturbe ◽  
Iñaki Garitano ◽  
Urko Zurutuza ◽  
Roberto Uribeetxeberria

Industrial Networks (INs) are widespread environments where heterogeneous devices collaborate to control and monitor physical processes. Some of the controlled processes belong to Critical Infrastructures (CIs), and, as such, IN protection is an active research field. Among different types of security solutions, IN Anomaly Detection Systems (ADSs) have received wide attention from the scientific community. While INs have grown in size and in complexity, requiring the development of novel, Big Data solutions for data processing, IN ADSs have not evolved at the same pace. In parallel, the development of Big Data frameworks such as Hadoop or Spark has led the way for applying Big Data Analytics to the field of cyber-security, mainly focusing on the Information Technology (IT) domain. However, due to the particularities of INs, it is not feasible to directly apply IT security mechanisms in INs, as IN ADSs face unique characteristics. In this work we introduce three main contributions. First, we survey the area of Big Data ADSs that could be applicable to INs and compare the surveyed works. Second, we develop a novel taxonomy to classify existing IN-based ADSs. And, finally, we present a discussion of open problems in the field of Big Data ADSs for INs that can lead to further development.


Energies ◽  
2020 ◽  
Vol 13 (18) ◽  
pp. 4906
Author(s):  
Ryszard K. Miler ◽  
Marcin J. Kisielewski ◽  
Anna Brzozowska ◽  
Antonina Kalinichenko

Implemented in road transport enterprises (RTEs) on a large scale, telematics systems are dedicated both to the particular aspects of their operation and to the integrated fields of the total operational functioning of such entities. Hence, a research problem can be defined as the identification of their efficiency levels in the context of operational activities undertaken by RTEs (including more holistic effects, e.g., lowering fuel/energy consumption and negative environmental impacts). Current research studies refer to the efficiency of some particular modules, but there have not been any publications focused on describing the efficiency of telematics systems in a more integrated (holistic) way, due to the lack of a universal tool that could be applied to provide this type of measurement. In this paper, an attempt at filling the identified cognitive gap is presented through empirical research analysing the original matrix developed by the authors that refers to the efficiency rates of organisational activities undertaken by RTEs. The purpose of this paper is to present a tool that has been designed to provide a holistic evaluation of efficiency of telematics systems in RTE operational management. The results are presented in a form of an individual (ontogenetic) matrix of the analysed companies, for which a determinant was calculated with the use of Sarrus’ rule. Obtained in such a way, the set of values identified for the determinants of the subsequent ontogenetic matrices came as an arithmetic progression that characterised the scope and the level of the influence exerted by the implemented IT (information technology) systems on the organisational efficiency of operational activities undertaken by the analysed RTEs. We present a hypothesis stating that the originally developed matrix can be viewed as a reliable tool used for comparative analysis in the field of efficiency of telematics systems in RTEs, and this hypothesis was positively verified during the research. The obtained results prove the significant potential for the wide application of the discussed matrix, which can be used as a universal tool for the analysis and comparison of efficiency indicated by the integrated IT systems in the operational activities undertaken by RTEs.


2016 ◽  
Vol 15 (9) ◽  
pp. 2063-2074
Author(s):  
Pedro Rosas Quiterio ◽  
Florencio Sanchez Silva ◽  
Ignacio Carvajal Mariscal ◽  
Jesus Alberto Meda Campana

Sign in / Sign up

Export Citation Format

Share Document