Experimentations with OpenStack System Logs and Support Vector Machine for an Anomaly Detection Model in a Private Cloud Infrastructure

Enterprise systems typically produce a large number of logs to record runtime states and important events. Log anomaly detection is efficient for business management and system maintenance. Most existing log-based anomaly detection methods use log parser to get log event indexes or event templates and then utilize machine learning methods to detect anomalies. However, these methods cannot handle unknown log types and do not take advantage of the log semantic information. In this article, we propose ConAnomaly, a log-based anomaly detection model composed of a log sequence encoder (log2vec) and multi-layer Long Short Term Memory Network (LSTM). We designed log2vec based on the Word2vec model, which first vectorized the words in the log content, then deleted the invalid words through part of speech tagging, and finally obtained the sequence vector by the weighted average method. In this way, ConAnomaly not only captures semantic information in the log but also leverages log sequential relationships. We evaluate our proposed approach on two log datasets. Our experimental results show that ConAnomaly has good stability and can deal with unseen log types to a certain extent, and it provides better performance than most log-based anomaly detection methods.

Download Full-text

MACHINE LEARNING METHODS IN MONITORING OPERATING BEHAVIOUR OF MARINE TWO-STROKE DIESEL ENGINE

Transport ◽

10.3846/transport.2020.14038 ◽

2020 ◽

Vol 35 (5) ◽

pp. 462-473

Author(s):

Aleksandar Vorkapić ◽

Radoslav Radonja ◽

Karlo Babić ◽

Sanda Martinčić-Ipšić

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Fuel Consumption ◽

Performance Monitoring ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Support Vector ◽

Operating Parameters ◽

Detection Model ◽

Modelling Framework

The aim of this article is to enhance performance monitoring of a two-stroke electronically controlled ship propulsion engine on the operating envelope. This is achieved by setting up a machine learning model capable of monitoring influential operating parameters and predicting the fuel consumption. Model is tested with different machine learning algorithms, namely linear regression, multilayer perceptron, Support Vector Machines (SVM) and Random Forests (RF). Upon verification of modelling framework and analysing the results in order to improve the prediction accuracy, the best algorithm is selected based on standard evaluation metrics, i.e. Root Mean Square Error (RMSE) and Relative Absolute Error (RAE). Experimental results show that, by taking an adequate combination and processing of relevant sensory data, SVM exhibit the lowest RMSE 7.1032 and RAE 0.5313%. RF achieve the lowest RMSE 22.6137 and RAE 3.8545% in a setting when minimal number of input variables is considered, i.e. cylinder indicated pressures and propulsion engine revolutions. Further, article deals with the detection of anomalies of operating parameters, which enables the evaluation of the propulsion engine condition and the early identification of failures and deterioration. Such a time-dependent, self-adopting anomaly detection model can be used for comparison with the initial condition recorded during the test and sea run or after survey and docking. Finally, we propose a unified model structure, incorporating fuel consumption prediction and anomaly detection model with on-board decision-making process regarding navigation and maintenance.

Download Full-text

Lightweight Anomaly Detection Scheme Using Incremental Principal Component Analysis and Support Vector Machine

Sensors ◽

10.3390/s21238017 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8017

Author(s):

Nurfazrina M. Zamry ◽

Anazida Zainal ◽

Murad A. Rassam ◽

Eman H. Alkhammash ◽

Fuad A. Ghaleb ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Sensor Networks ◽

Computational Complexity ◽

Anomaly Detection ◽

Principal Component ◽

Support Vector ◽

Communication Overhead ◽

Detection Scheme ◽

Memory Utilization

Wireless Sensors Networks have been the focus of significant attention from research and development due to their applications of collecting data from various fields such as smart cities, power grids, transportation systems, medical sectors, military, and rural areas. Accurate and reliable measurements for insightful data analysis and decision-making are the ultimate goals of sensor networks for critical domains. However, the raw data collected by WSNs usually are not reliable and inaccurate due to the imperfect nature of WSNs. Identifying misbehaviours or anomalies in the network is important for providing reliable and secure functioning of the network. However, due to resource constraints, a lightweight detection scheme is a major design challenge in sensor networks. This paper aims at designing and developing a lightweight anomaly detection scheme to improve efficiency in terms of reducing the computational complexity and communication and improving memory utilization overhead while maintaining high accuracy. To achieve this aim, one-class learning and dimension reduction concepts were used in the design. The One-Class Support Vector Machine (OCSVM) with hyper-ellipsoid variance was used for anomaly detection due to its advantage in classifying unlabelled and multivariate data. Various One-Class Support Vector Machine formulations have been investigated and Centred-Ellipsoid has been adopted in this study due to its effectiveness. Centred-Ellipsoid is the most effective kernel among studies formulations. To decrease the computational complexity and improve memory utilization, the dimensions of the data were reduced using the Candid Covariance-Free Incremental Principal Component Analysis (CCIPCA) algorithm. Extensive experiments were conducted to evaluate the proposed lightweight anomaly detection scheme. Results in terms of detection accuracy, memory utilization, computational complexity, and communication overhead show that the proposed scheme is effective and efficient compared few existing schemes evaluated. The proposed anomaly detection scheme achieved the accuracy higher than 98%, with (𝑛𝑑) memory utilization and no communication overhead.

Download Full-text

Fraud Detection Model by Using Support Vector Machine Techniques

International Journal of Digital Content Technology and its Applications ◽

10.4156/jdcta.vol7.issue2.5 ◽

2013 ◽

Vol 7 (2) ◽

pp. 32-42 ◽

Cited By ~ 8

Author(s):

Shaio Yan Huang

Keyword(s):

Support Vector Machine ◽

Fraud Detection ◽

Support Vector ◽

Detection Model

Download Full-text

Valid Probabilistic Anomaly Detection Models for System Logs

Wireless Communications and Mobile Computing ◽

10.1155/2020/8827185 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chunbo Liu ◽

Lanlan Pan ◽

Zhaojun Gu ◽

Jialiang Wang ◽

Yitong Ren ◽

...

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Learning Algorithm ◽

Recall Rate ◽

Support Vector ◽

Fusion Algorithm ◽

Flexible Tool ◽

System Logs ◽

Output Only ◽

Better Than

System logs can record the system status and important events during system operation in detail. Detecting anomalies in the system logs is a common method for modern large-scale distributed systems. Yet threshold-based classification models used for anomaly detection output only two values: normal or abnormal, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is adopted to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the probability distribution of labels for a set of samples and provide a quality assessment of predictive labels to some extent. Two Venn-Abers predictors LR-VA and SVM-VA have been implemented based on Logistic Regression and Support Vector Machine, respectively. Then, the differences among different algorithms are considered so as to build a multimodel fusion algorithm by Stacking. And then a Venn-Abers predictor based on the Stacking algorithm called Stacking-VA is implemented. The performances of four types of algorithms (unimodel, Venn-Abers predictor based on unimodel, multimodel, and Venn-Abers predictor based on multimodel) are compared in terms of validity and accuracy. Experiments are carried out on a log dataset of the Hadoop Distributed File System (HDFS). For the comparative experiments on unimodels, the results show that the validities of LR-VA and SVM-VA are better than those of the two corresponding underlying models. Compared with the underlying model, the accuracy of the SVM-VA predictor is better than that of LR-VA predictor, and more significantly, the recall rate increases from 81% to 94%. In the case of experiments on multiple models, the algorithm based on Stacking multimodel fusion is significantly superior to the underlying classifier. The average accuracy of Stacking-VA is larger than 0.95, which is more stable than the prediction results of LR-VA and SVM-VA. Experimental results show that the Venn-Abers predictor is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.

Download Full-text