scholarly journals ConAnomaly: Content-Based Anomaly Detection for System Logs

Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6125
Author(s):  
Dan Lv ◽  
Nurbol Luktarhan ◽  
Yiyong Chen

Enterprise systems typically produce a large number of logs to record runtime states and important events. Log anomaly detection is efficient for business management and system maintenance. Most existing log-based anomaly detection methods use log parser to get log event indexes or event templates and then utilize machine learning methods to detect anomalies. However, these methods cannot handle unknown log types and do not take advantage of the log semantic information. In this article, we propose ConAnomaly, a log-based anomaly detection model composed of a log sequence encoder (log2vec) and multi-layer Long Short Term Memory Network (LSTM). We designed log2vec based on the Word2vec model, which first vectorized the words in the log content, then deleted the invalid words through part of speech tagging, and finally obtained the sequence vector by the weighted average method. In this way, ConAnomaly not only captures semantic information in the log but also leverages log sequential relationships. We evaluate our proposed approach on two log datasets. Our experimental results show that ConAnomaly has good stability and can deal with unseen log types to a certain extent, and it provides better performance than most log-based anomaly detection methods.

Author(s):  
Baoquan Wang ◽  
Tonghai Jiang ◽  
Xi Zhou ◽  
Bo Ma ◽  
Fan Zhao ◽  
...  

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.


Information ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 262
Author(s):  
Ying Zhao ◽  
Junjun Chen ◽  
Di Wu ◽  
Jian Teng ◽  
Nabin Sharma ◽  
...  

Anomaly detection of network traffic flows is a non-trivial problem in the field of network security due to the complexity of network traffic. However, most machine learning-based detection methods focus on network anomaly detection but ignore the user anomaly behavior detection. In real scenarios, the anomaly network behavior may harm the user interests. In this paper, we propose an anomaly detection model based on time-decay closed frequent patterns to address this problem. The model mines closed frequent patterns from the network traffic of each user and uses a time-decay factor to distinguish the weight of current and historical network traffic. Because of the dynamic nature of user network behavior, a detection model update strategy is provided in the anomaly detection framework. Additionally, the closed frequent patterns can provide interpretable explanations for anomalies. Experimental results show that the proposed method can detect user behavior anomaly, and the network anomaly detection performance achieved by the proposed method is similar to the state-of-the-art methods and significantly better than the baseline methods.


2013 ◽  
Vol 373-375 ◽  
pp. 1780-1783 ◽  
Author(s):  
Hong Chao Chen ◽  
Jin Jin Wang ◽  
Xin Hua Zhu

This paper constructs domain ontology for Data Structure Course and standard (student) answer sentence framework, then proposes a new approach to automatic marking Chinese subjective questions based on them. This method deals with the standard (student) answer in word segmentation, part-of-speech tagging, pronouns digestion, extracting framework, calculating word similarity. Compared with the traditional ones, this means allows the computer to understand the semantic information as much as possible, keeps the semantic relations between standard answer and the students, improves scoring accuracy.


Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2893 ◽  
Author(s):  
Sunoh Choi ◽  
Jangseong Bae ◽  
Changki Lee ◽  
Youngsoo Kim ◽  
Jonghyun Kim

Every day, hundreds of thousands of malicious files are created to exploit zero-day vulnerabilities. Existing pattern-based antivirus solutions face difficulties in coping with such a large number of new malicious files. To solve this problem, artificial intelligence (AI)-based malicious file detection methods have been proposed. However, even if we can detect malicious files with high accuracy using deep learning, it is difficult to identify why files are malicious. In this study, we propose a malicious file feature extraction method based on attention mechanism. First, by adapting the attention mechanism, we can identify application program interface (API) system calls that are more important than others for determining whether a file is malicious. Second, we confirm that this approach yields an accuracy that is approximately 12% and 5% higher than a conventional AI-based detection model using convolutional neural networks and skip-connected long short-term memory-based detection model, respectively.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Hongchao Song ◽  
Zhuqing Jiang ◽  
Aidong Men ◽  
Bo Yang

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.


2021 ◽  
Vol 21 (3) ◽  
pp. 175-188
Author(s):  
Sumaiya Thaseen Ikram ◽  
Aswani Kumar Cherukuri ◽  
Babu Poorva ◽  
Pamidi Sai Ushasree ◽  
Yishuo Zhang ◽  
...  

Abstract Intrusion Detection Systems (IDSs) utilise deep learning techniques to identify intrusions with maximum accuracy and reduce false alarm rates. The feature extraction is also automated in these techniques. In this paper, an ensemble of different Deep Neural Network (DNN) models like MultiLayer Perceptron (MLP), BackPropagation Network (BPN) and Long Short Term Memory (LSTM) are stacked to build a robust anomaly detection model. The performance of the ensemble model is analysed on different datasets, namely UNSW-NB15 and a campus generated dataset named VIT_SPARC20. Other types of traffic, namely unencrypted normal traffic, normal encrypted traffic, encrypted and unencrypted malicious traffic, are captured in the VIT_SPARC20 dataset. Encrypted normal and malicious traffic of VIT_SPARC20 is categorised by the deep learning models without decrypting its contents, thus preserving the confidentiality and integrity of the data transmitted. XGBoost integrates the results of each deep learning model to achieve higher accuracy. From experimental analysis, it is inferred that UNSW_ NB results in a maximal accuracy of 99.5%. The performance of VIT_SPARC20 in terms of accuracy, precision and recall are 99.4%. 98% and 97%, respectively.


Author(s):  
Fei Li ◽  
Hongzhi Wang ◽  
Guowen Zhou ◽  
Daren Yu ◽  
Jianzhong Li ◽  
...  

Anomaly detection plays a significant role in helping gas turbines run reliably and economically. Considering collective anomalous data and both sensitivity and robustness of the anomaly detection model, a sequential symbolic anomaly detection method is proposed and applied to the gas turbine fuel system. A structural Finite State Machine is to evaluate posterior probabilities of observing symbolic sequences and most probable state sequences they may locate. Hence an estimating based model and a decoding based model are used to identify anomalies in two different ways. Experimental results indicates that these two models have both ideal performance overall, and estimating based model has a strong ability in robustness, while decoding based model has a strong ability in accuracy, particularly in a certain range of length of sequence. Therefore, the proposed method can well facilitate existing symbolic dynamic analysis based anomaly detection methods especially in gas turbine domain.


Author(s):  
Mohammad Rasool Fatemi ◽  
Ali A. Ghorbani

System logs are one of the most important sources of information for anomaly and intrusion detection systems. In a general log-based anomaly detection system, network, devices, and host logs are all collected and used together for analysis and the detection of anomalies. However, the ever-increasing volume of logs remains as one of the main challenges that anomaly detection tools face. Based on Sysmon, this chapter proposes a host-based log analysis system that detects anomalies without using network logs to reduce the volume and to show the importance of host-based logs. The authors implement a Sysmon parser to parse and extract features from the logs and use them to perform detection methods on the data. The valuable information is successfully retained after two extensive volume reduction steps. An anomaly detection system is proposed and performed on five different datasets with up to 55,000 events which detects the attacks using the preserved logs. The analysis results demonstrate the significance of host-based logs in auditing, security monitoring, and intrusion detection systems.


Sign in / Sign up

Export Citation Format

Share Document