ConAnomaly: Content-Based Anomaly Detection for System Logs

Enterprise systems typically produce a large number of logs to record runtime states and important events. Log anomaly detection is efficient for business management and system maintenance. Most existing log-based anomaly detection methods use log parser to get log event indexes or event templates and then utilize machine learning methods to detect anomalies. However, these methods cannot handle unknown log types and do not take advantage of the log semantic information. In this article, we propose ConAnomaly, a log-based anomaly detection model composed of a log sequence encoder (log2vec) and multi-layer Long Short Term Memory Network (LSTM). We designed log2vec based on the Word2vec model, which first vectorized the words in the log content, then deleted the invalid words through part of speech tagging, and finally obtained the sequence vector by the weighted average method. In this way, ConAnomaly not only captures semantic information in the log but also leverages log sequential relationships. We evaluate our proposed approach on two log datasets. Our experimental results show that ConAnomaly has good stability and can deal with unseen log types to a certain extent, and it provides better performance than most log-based anomaly detection methods.

Download Full-text

Part-of-Speech Tagging Using Long Short Term Memory (LSTM): Amazigh Text Written in Tifinaghe Characters

Business Intelligence - Lecture Notes in Business Information Processing ◽

10.1007/978-3-030-76508-8_1 ◽

2021 ◽

pp. 3-17

Author(s):

Otman Maarouf ◽

Rachid El Ayachi

Keyword(s):

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Long Short Term Memory ◽

Speech Tagging

Download Full-text

Experimentations with OpenStack System Logs and Support Vector Machine for an Anomaly Detection Model in a Private Cloud Infrastructure

2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) ◽

10.1109/icabcd49160.2020.9183878 ◽

2020 ◽

Author(s):

Matthew Akanle ◽

Emmanuel Adetiba ◽

Victor Akande ◽

Adekunle Akinrinmade ◽

Sunday Ajala ◽

...

Keyword(s):

Support Vector Machine ◽

Anomaly Detection ◽

Support Vector ◽

Cloud Infrastructure ◽

Private Cloud ◽

Detection Model ◽

System Logs

Download Full-text

Variance error of multi-classification based anomaly detection for time series data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-204699 ◽

2020 ◽

pp. 1-16

Author(s):

Baoquan Wang ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Fan Zhao ◽

...

Keyword(s):

Neural Network ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Computational Cost ◽

Reconstruction Error ◽

Detection Methods ◽

Series Data ◽

Data Set

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.

Download Full-text

Network Anomaly Detection by Using a Time-Decay Closed Frequent Pattern

Information ◽

10.3390/info10080262 ◽

2019 ◽

Vol 10 (8) ◽

pp. 262

Author(s):

Ying Zhao ◽

Junjun Chen ◽

Di Wu ◽

Jian Teng ◽

Nabin Sharma ◽

...

Keyword(s):

Anomaly Detection ◽

Network Traffic ◽

User Behavior ◽

Frequent Pattern ◽

Detection Methods ◽

Frequent Patterns ◽

Time Decay ◽

Network Behavior ◽

Detection Model ◽

Network Anomaly Detection

Anomaly detection of network traffic flows is a non-trivial problem in the field of network security due to the complexity of network traffic. However, most machine learning-based detection methods focus on network anomaly detection but ignore the user anomaly behavior detection. In real scenarios, the anomaly network behavior may harm the user interests. In this paper, we propose an anomaly detection model based on time-decay closed frequent patterns to address this problem. The model mines closed frequent patterns from the network traffic of each user and uses a time-decay factor to distinguish the weight of current and historical network traffic. Because of the dynamic nature of user network behavior, a detection model update strategy is provided in the anomaly detection framework. Additionally, the closed frequent patterns can provide interpretable explanations for anomalies. Experimental results show that the proposed method can detect user behavior anomaly, and the network anomaly detection performance achieved by the proposed method is similar to the state-of-the-art methods and significantly better than the baseline methods.

Download Full-text

Automatic Scoring Algorithm of Chinese Subjective Questions Based on Domain Ontology and Sentence Framework

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.373-375.1780 ◽

2013 ◽

Vol 373-375 ◽

pp. 1780-1783 ◽

Cited By ~ 1

Author(s):

Hong Chao Chen ◽

Jin Jin Wang ◽

Xin Hua Zhu

Keyword(s):

Semantic Information ◽

Domain Ontology ◽

New Approach ◽

Word Similarity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Scoring Algorithm ◽

Scoring Accuracy ◽

Automatic Scoring ◽

Speech Tagging

This paper constructs domain ontology for Data Structure Course and standard (student) answer sentence framework, then proposes a new approach to automatic marking Chinese subjective questions based on them. This method deals with the standard (student) answer in word segmentation, part-of-speech tagging, pronouns digestion, extracting framework, calculating word similarity. Compared with the traditional ones, this means allows the computer to understand the semantic information as much as possible, keeps the semantic relations between standard answer and the students, improves scoring accuracy.

Download Full-text

Attention-Based Automated Feature Extraction for Malware Analysis

Sensors ◽

10.3390/s20102893 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2893 ◽

Cited By ~ 3

Author(s):

Sunoh Choi ◽

Jangseong Bae ◽

Changki Lee ◽

Youngsoo Kim ◽

Jonghyun Kim

Keyword(s):

Feature Extraction ◽

Short Term Memory ◽

Attention Mechanism ◽

Detection Methods ◽

Malware Analysis ◽

Feature Extraction Method ◽

Detection Model ◽

System Calls ◽

Long Short Term Memory ◽

Program Interface

Every day, hundreds of thousands of malicious files are created to exploit zero-day vulnerabilities. Existing pattern-based antivirus solutions face difficulties in coping with such a large number of new malicious files. To solve this problem, artificial intelligence (AI)-based malicious file detection methods have been proposed. However, even if we can detect malicious files with high accuracy using deep learning, it is difficult to identify why files are malicious. In this study, we propose a malicious file feature extraction method based on attention mechanism. First, by adapting the attention mechanism, we can identify application program interface (API) system calls that are more important than others for determining whether a file is malicious. Second, we confirm that this approach yields an accuracy that is approximately 12% and 5% higher than a conventional AI-based detection model using convolutional neural networks and skip-connected long short-term memory-based detection model, respectively.

Download Full-text

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Computational Intelligence and Neuroscience ◽

10.1155/2017/8501683 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Hongchao Song ◽

Zhuqing Jiang ◽

Aidong Men ◽

Bo Yang

Keyword(s):

Anomaly Detection ◽

Nearest Neighbor ◽

Dimensional Space ◽

High Dimensional Data ◽

Real Life ◽

Detection Methods ◽

High Dimensional ◽

Detection Accuracy ◽

Detection Model ◽

Anomaly Detector

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

Download Full-text

Anomaly Detection Using XGBoost Ensemble of Deep Neural Network Models

Cybernetics and Information Technologies ◽

10.2478/cait-2021-0037 ◽

2021 ◽

Vol 21 (3) ◽

pp. 175-188

Author(s):

Sumaiya Thaseen Ikram ◽

Aswani Kumar Cherukuri ◽

Babu Poorva ◽

Pamidi Sai Ushasree ◽

Yishuo Zhang ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Anomaly Detection ◽

Deep Neural Network ◽

Short Term Memory ◽

Network Models ◽

Neural Network Models ◽

Detection Model ◽

Detection Systems ◽

Learning Techniques

Abstract Intrusion Detection Systems (IDSs) utilise deep learning techniques to identify intrusions with maximum accuracy and reduce false alarm rates. The feature extraction is also automated in these techniques. In this paper, an ensemble of different Deep Neural Network (DNN) models like MultiLayer Perceptron (MLP), BackPropagation Network (BPN) and Long Short Term Memory (LSTM) are stacked to build a robust anomaly detection model. The performance of the ensemble model is analysed on different datasets, namely UNSW-NB15 and a campus generated dataset named VIT_SPARC20. Other types of traffic, namely unencrypted normal traffic, normal encrypted traffic, encrypted and unencrypted malicious traffic, are captured in the VIT_SPARC20 dataset. Encrypted normal and malicious traffic of VIT_SPARC20 is categorised by the deep learning models without decrypting its contents, thus preserving the confidentiality and integrity of the data transmitted. XGBoost integrates the results of each deep learning model to achieve higher accuracy. From experimental analysis, it is inferred that UNSW_ NB results in a maximal accuracy of 99.5%. The performance of VIT_SPARC20 in terms of accuracy, precision and recall are 99.4%. 98% and 97%, respectively.

Download Full-text

Anomaly Detection on Gas Turbine Fuel System Using a Sequential Symbolic Method

10.20944/preprints201704.0071.v1 ◽

2017 ◽

Author(s):

Fei Li ◽

Hongzhi Wang ◽

Guowen Zhou ◽

Daren Yu ◽

Jianzhong Li ◽

...

Keyword(s):

Anomaly Detection ◽

Gas Turbine ◽

Gas Turbines ◽

Detection Methods ◽

Fuel System ◽

Detection Model ◽

Anomalous Data ◽

Symbolic Sequences ◽

Finite State ◽

Strong Ability

Anomaly detection plays a significant role in helping gas turbines run reliably and economically. Considering collective anomalous data and both sensitivity and robustness of the anomaly detection model, a sequential symbolic anomaly detection method is proposed and applied to the gas turbine fuel system. A structural Finite State Machine is to evaluate posterior probabilities of observing symbolic sequences and most probable state sequences they may locate. Hence an estimating based model and a decoding based model are used to identify anomalies in two different ways. Experimental results indicates that these two models have both ideal performance overall, and estimating based model has a strong ability in robustness, while decoding based model has a strong ability in accuracy, particularly in a certain range of length of sequence. Therefore, the proposed method can well facilitate existing symbolic dynamic analysis based anomaly detection methods especially in gas turbine domain.

Download Full-text

Threat Hunting in Windows Using Big Security Log Data

Security, Privacy, and Forensics Issues in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-9742-1.ch007 ◽

2020 ◽

pp. 168-188 ◽

Cited By ~ 1

Author(s):

Mohammad Rasool Fatemi ◽

Ali A. Ghorbani

Keyword(s):

Intrusion Detection ◽

Anomaly Detection ◽

Detection System ◽

Intrusion Detection Systems ◽

Detection Methods ◽

Sources Of Information ◽

Detection Systems ◽

System Logs ◽

Analysis System ◽

Anomaly Detection System

System logs are one of the most important sources of information for anomaly and intrusion detection systems. In a general log-based anomaly detection system, network, devices, and host logs are all collected and used together for analysis and the detection of anomalies. However, the ever-increasing volume of logs remains as one of the main challenges that anomaly detection tools face. Based on Sysmon, this chapter proposes a host-based log analysis system that detects anomalies without using network logs to reduce the volume and to show the importance of host-based logs. The authors implement a Sysmon parser to parse and extract features from the logs and use them to perform detection methods on the data. The valuable information is successfully retained after two extensive volume reduction steps. An anomaly detection system is proposed and performed on five different datasets with up to 55,000 events which detects the attacks using the preserved logs. The analysis results demonstrate the significance of host-based logs in auditing, security monitoring, and intrusion detection systems.

Download Full-text