Log Message Anomaly Detection with Oversampling

Mapping Intimacies ◽

10.31224/osf.io/d4e6a ◽

2020 ◽

Author(s):

Amir Farzad ◽

T. Aaron Gulliver

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Anomaly Detection ◽

Learning Algorithms ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Data Sets ◽

Significant Challenge ◽

Proposed Model

Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing improves anomaly detection accuracy.

Download Full-text

Comparison of Anomaly Detection Accuracy of Host-based Intrusion Detection Systems based on Different Machine Learning Algorithms

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110233 ◽

2020 ◽

Vol 11 (2) ◽

Author(s):

Yukyung Shin ◽

Kangseok Kim

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Detection Accuracy ◽

Detection Systems

Download Full-text

A Research on Deep Learning Advance for Landslide Classification using Convolutional Neural Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1184.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 903-906

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Proposed Model

Landslides can easily be tragic to human life and property. Increase in the rate of human settlement in the mountains has resulted in safety concerns. Landslides have caused economic loss between 1-2% of the GDP in many developing countries. In this study, we discuss a deep learning approach to detect landslides. Convolutional Neural Networks are used for feature extraction for our proposed model. As there was no source of an exact and precise data set for feature extraction, therefore, a new data set was built for testing the model. We have tested and compared this work with our proposed model and with other machine-learning algorithms such as Logistic Regression, Random Forest, AdaBoost, K-Nearest Neighbors and Support Vector Machine. Our proposed deep learning model produces a classification accuracy of 96.90% outperforming the classical machine-learning algorithms.

Download Full-text

Anomaly detection and missing data imputation in building energy data for automated data pre-processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/2069/1/012144 ◽

2021 ◽

Vol 2069 (1) ◽

pp. 012144

Author(s):

K Takahashi ◽

R Ooka ◽

S Ikeda

Keyword(s):

Neural Network ◽

Machine Learning ◽

Missing Data ◽

Anomaly Detection ◽

Prediction Accuracy ◽

Energy Savings ◽

Predictive Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Detection Accuracy

Abstract A new trend in building automation is the implementation of smart energy management systems to measure and control building systems without a need for decision-making by human operators. Artificial intelligence can optimize these systems by predicting future demand to make informed decisions about how to efficiently operate individual equipment. These machine learning algorithms use historical data to learn demand trends and require high quality datasets in order to make accurate predictions. But because of issues with data transmission or sensor errors, real world datasets often contain outliers or have data missing. In most research settings, these values can be simply omitted, but in practice, anomalies compromise the automation system’s prediction accuracy, rendering it unable to maximize energy savings. This study explores different machine learning algorithms for anomaly detection for automatically pre-processing incoming data using a case study on an actual electrical demand in a hospital building in Japan, namely cluster-based techniques such as k-means clustering and neural network-based approaches such as the autoencoder. Once anomalies were identified, the missing data was filled with prediction values from a deep neural network model. The newly composed data was then evaluated based on detection accuracy, prediction accuracy and training time. The proposed method of processing anomaly values allows the prediction model to process collected data without interruption, and shows similar predictive accuracy as manually processing the data. These predictions allow energy systems to optimize HVAC equipment control, increasing energy savings and reducing peak building loads.

Download Full-text

Anomaly Detection in Market Data Structures Via Machine Learning Algorithms

SSRN Electronic Journal ◽

10.2139/ssrn.3516028 ◽

2020 ◽

Author(s):

Dirk Röder ◽

Henning Mueller

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Data Structures ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Market Data

Download Full-text

Anomaly Detection Technique for Intrusion Detection in SDN Environment using Continuous Data Stream Machine Learning Algorithms

2021 IEEE International Systems Conference (SysCon) ◽

10.1109/syscon48628.2021.9447092 ◽

2021 ◽

Author(s):

Admilson de Ribamar Lima Ribeiro ◽

Reneilson Yves Carvalho Santos ◽

Anderson Clayton Alves Nascimento

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Data Stream ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Detection Technique ◽

Continuous Data

Download Full-text

Performance Analysis of Machine Learning Algorithms and Feature Extraction Methods for Sentiment Analysis

10.1109/icses52305.2021.9633882 ◽

2021 ◽

Author(s):

Anshumaan Chauhan ◽

Ayushi Agarwal ◽

Razia Sulthana

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Analysis ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms

Download Full-text

Detecting TCP Flood DDoS Attack by Anomaly Detection based on Machine Learning Algorithms

10.1109/ubmk52708.2021.9558989 ◽

2021 ◽

Author(s):

Berkay Ozcam ◽

H. Hakan Kilinc ◽

Abdul Halim Zaim

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ddos Attack

Download Full-text

Comparison of Machine Learning Algorithms to Recognize Human Activities from Images and Videos Using Pose Estimation and Feature Extraction

Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-63128-4_7 ◽

2020 ◽

pp. 78-87

Author(s):

Md Hasibul Huq ◽

Mohammed Alnakli ◽

Zakiya Jafrin ◽

Tanjima Nasreen Jenia

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Pose Estimation ◽

Human Activities ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

A fault sensitivity analysis for anomaly detection in water distribution systems using Machine Learning algorithms

2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP) ◽

10.1109/iccp.2018.8516643 ◽

2018 ◽

Author(s):

Alexandru Predescu ◽

Mariana Mocanu ◽

Ciprian Lupu

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Anomaly Detection ◽

Distribution Systems ◽

Water Distribution ◽

Learning Algorithms ◽

Water Distribution Systems ◽

Machine Learning Algorithms ◽

Fault Sensitivity

Download Full-text

SeisBench: A toolbox for benchmarking and applying machine learning in seismology.

10.5194/egusphere-egu21-12218 ◽

2021 ◽

Author(s):

Jack Woollam ◽

Jannes Münchmeyer ◽

Carlo Giunchi ◽

Dario Jozinovic ◽

Tobias Diehl ◽

...

Keyword(s):

Machine Learning ◽

Model Comparison ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Quality Data ◽

Data Sets ◽

Waveform Data ◽

Detection Techniques ◽

Benchmark Data

<p>Machine learning methods have seen widespread adoption within the seismological community in recent years due to their ability to effectively process large amounts of data, while equalling or surpassing the performance of human analysts or classic algorithms. In the wider machine learning world, for example in imaging applications, the open availability of extensive high-quality datasets for training, validation, and the benchmarking of competing algorithms is seen as a vital ingredient to the rapid progress observed throughout the last decade. Within seismology, vast catalogues of labelled data are readily available, but collecting the waveform data for millions of records and assessing the quality of training examples is a time-consuming, tedious process. The natural variability in source processes and seismic wave propagation also presents a critical problem during training. The performance of models trained on different regions, distance and magnitude ranges are not easily comparable. The inability to easily compare and contrast state-of-the-art machine learning-based detection techniques on varying seismic data sets is currently a barrier to further progress within this emerging field. We present SeisBench, an extensible open-source framework for training, benchmarking, and applying machine learning algorithms. SeisBench provides access to various benchmark data sets and models from literature, along with pre-trained model weights, through a unified API. Built to be extensible, and modular, SeisBench allows for the simple addition of new models and data sets, which can be easily interchanged with existing pre-trained models and benchmark data. Standardising the access of varying quality data, and metadata simplifies comparison workflows, enabling the development of more robust machine learning algorithms. We initially focus on phase detection, identification and picking, but the framework is designed to be extended for other purposes, for example direct estimation of event parameters. Users will be able to contribute their own benchmarks and (trained) models. In the future, it will thus be much easier to compare both the performance of new algorithms against published machine learning models/architectures and to check the performance of established algorithms against new data sets. We hope that the ease of validation and inter-model comparison enabled by SeisBench will serve as a catalyst for the development of the next generation of machine learning techniques within the seismological community. The SeisBench source code will be published with an open license and explicitly encourages community involvement.</p>

Download Full-text