Anomaly detection and missing data imputation in building energy data for automated data pre-processing

Abstract A new trend in building automation is the implementation of smart energy management systems to measure and control building systems without a need for decision-making by human operators. Artificial intelligence can optimize these systems by predicting future demand to make informed decisions about how to efficiently operate individual equipment. These machine learning algorithms use historical data to learn demand trends and require high quality datasets in order to make accurate predictions. But because of issues with data transmission or sensor errors, real world datasets often contain outliers or have data missing. In most research settings, these values can be simply omitted, but in practice, anomalies compromise the automation system’s prediction accuracy, rendering it unable to maximize energy savings. This study explores different machine learning algorithms for anomaly detection for automatically pre-processing incoming data using a case study on an actual electrical demand in a hospital building in Japan, namely cluster-based techniques such as k-means clustering and neural network-based approaches such as the autoencoder. Once anomalies were identified, the missing data was filled with prediction values from a deep neural network model. The newly composed data was then evaluated based on detection accuracy, prediction accuracy and training time. The proposed method of processing anomaly values allows the prediction model to process collected data without interruption, and shows similar predictive accuracy as manually processing the data. These predictions allow energy systems to optimize HVAC equipment control, increasing energy savings and reducing peak building loads.

Download Full-text

Log Message Anomaly Detection with Oversampling

10.31224/osf.io/d4e6a ◽

2020 ◽

Author(s):

Amir Farzad ◽

T. Aaron Gulliver

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Anomaly Detection ◽

Learning Algorithms ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Data Sets ◽

Significant Challenge ◽

Proposed Model

Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing improves anomaly detection accuracy.

Download Full-text

Comparison of Anomaly Detection Accuracy of Host-based Intrusion Detection Systems based on Different Machine Learning Algorithms

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110233 ◽

2020 ◽

Vol 11 (2) ◽

Author(s):

Yukyung Shin ◽

Kangseok Kim

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Detection Accuracy ◽

Detection Systems

Download Full-text

Predicting rural patients� use of eHealth through supervised machine learning algorithms: A study on Portable Health Clinic in Bangladesh (Preprint)

10.2196/preprints.10761 ◽

2018 ◽

Author(s):

Nazmul Hossain ◽

Fumihiko Yokota ◽

Akira Fukuda ◽

Ashir Ahmed

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Predictive Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Rural Patients

BACKGROUND Predictive analytics through machine learning has been extensively using across industries including eHealth and mHealth for analyzing patient’s health data, predicting diseases, enhancing the productivity of technology or devices used for providing healthcare services and so on. However, not enough studies were conducted to predict the usage of eHealth by rural patients in developing countries. OBJECTIVE The objective of this study is to predict rural patients’ use of eHealth through supervised machine learning algorithms and propose the best-fitted model after evaluating their performances in terms of predictive accuracy. METHODS Data were collected between June and July 2016 through a field survey with structured questionnaire form 292 randomly selected rural patients in a remote North-Western sub-district of Bangladesh. Four supervised machine learning algorithms namely logistic regression, boosted decision tree, support vector machine, and artificial neural network were chosen for this experiment. A ‘correlation-based feature selection’ technique was applied to include the most relevant but not redundant features into the model. A 10-fold cross-validation technique was applied to reduce bias and over-fitting of the data. RESULTS Logistic regression outperformed other three algorithms with 85.9% predictive accuracy, 86.4% precision, 90.5% recall, 88.1% F-score, and AUC of 91.5% followed by neural network, decision tree and support vector machine with the accuracy rate of 84.2%, 82.9 %, and 80.4% respectively. CONCLUSIONS The findings of this study are expected to be helpful for eHealth practitioners in selecting appropriate areas to serve and dealing with both under-capacity and over-capacity by predicting the patients’ response in advance with a certain level of accuracy and precision.

Download Full-text

Predicting Porosity and Water Saturation from Well-Log Data Using Probabilistic Multi-Task Neural Network with Normalizing Flows

10.4043/31085-ms ◽

2021 ◽

Author(s):

Jinwoo Lee ◽

Minsu Kwon ◽

Youngjun Hong

Keyword(s):

Neural Network ◽

Machine Learning ◽

Prediction Accuracy ◽

Water Saturation ◽

Learning Algorithms ◽

Core Sample ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Well Log ◽

Sample Data

Abstract In the oil and gas exploration process, understanding the hydrocarbon distribution of a reservoir is important. Well-log and core sample data such as porosity and water saturation are widely used for this purpose. With porosity and water saturation, we can calculate hydrocarbon volume more accurately than using well-log solely. However, as obtaining core sample data is expensive and time-consuming, predicting it with well-log can be a valuable solution for early-stage exploration since acquiring well-log is relatively economic and swift. Recently, numerous studies applied machine learning algorithms to predict core data from well-log. To the best of our knowledge, most works provide point estimation without probabilistic distribution modeling. In this paper, we developed a probabilistic deep neural network to provide uncertainty via confidence interval. Besides, we employed normalizing flows and multi-task learning to improve prediction accuracy. With this approach, we present the model's uncertainty that can be reliable information for decision making. Furthermore, we demonstrate our model outperforms other supervised machine learning algorithms regards to prediction accuracy.

Download Full-text

Anomaly Detection in Market Data Structures Via Machine Learning Algorithms

SSRN Electronic Journal ◽

10.2139/ssrn.3516028 ◽

2020 ◽

Author(s):

Dirk Röder ◽

Henning Mueller

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Data Structures ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Market Data

Download Full-text

Anomaly Detection Technique for Intrusion Detection in SDN Environment using Continuous Data Stream Machine Learning Algorithms

2021 IEEE International Systems Conference (SysCon) ◽

10.1109/syscon48628.2021.9447092 ◽

2021 ◽

Author(s):

Admilson de Ribamar Lima Ribeiro ◽

Reneilson Yves Carvalho Santos ◽

Anderson Clayton Alves Nascimento

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Data Stream ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Detection Technique ◽

Continuous Data

Download Full-text

Detecting TCP Flood DDoS Attack by Anomaly Detection based on Machine Learning Algorithms

10.1109/ubmk52708.2021.9558989 ◽

2021 ◽

Author(s):

Berkay Ozcam ◽

H. Hakan Kilinc ◽

Abdul Halim Zaim

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ddos Attack

Download Full-text

EMOTIONS RECOGNITION IN HUMAN SPEECH USING DEEP NEURAL NETWORKS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2021.01.pp.044-051 ◽

2021 ◽

pp. 44-51

Author(s):

E. Yu. Shchetinin

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Network ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Audio Recordings ◽

Computer Studies

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.

Download Full-text

A fault sensitivity analysis for anomaly detection in water distribution systems using Machine Learning algorithms

2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP) ◽

10.1109/iccp.2018.8516643 ◽

2018 ◽

Author(s):

Alexandru Predescu ◽

Mariana Mocanu ◽

Ciprian Lupu

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Anomaly Detection ◽

Distribution Systems ◽

Water Distribution ◽

Learning Algorithms ◽

Water Distribution Systems ◽

Machine Learning Algorithms ◽

Fault Sensitivity

Download Full-text

Detection of Spoofing Attack using Machine Learning based on Multi-Layer Neural Network in Single-Frequency GPS Receivers

Journal of Navigation ◽

10.1017/s0373463317000558 ◽

2017 ◽

Vol 71 (1) ◽

pp. 169-188 ◽

Cited By ~ 12

Author(s):

E. Shafiee ◽

M. R. Mosavi ◽

M. Moazedi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Late Phase ◽

Machine Learning Algorithms ◽

Computational Method ◽

Single Frequency ◽

Detection Accuracy ◽

Gps Receiver ◽

Navigation Data ◽

Spoofing Detection

The importance of the Global Positioning System (GPS) and related electronic systems continues to increase in a range of environmental, engineering and navigation applications. However, civilian GPS signals are vulnerable to Radio Frequency (RF) interference. Spoofing is an intentional intervention that aims to force a GPS receiver to acquire and track invalid navigation data. Analysis of spoofing and authentic signal patterns represents the differences as phase, energy and imaginary components of the signal. In this paper, early-late phase, delta, and signal level as the three main features are extracted from the correlation output of the tracking loop. Using these features, spoofing detection can be performed by exploiting conventional machine learning algorithms such as K-Nearest Neighbourhood (KNN) and naive Bayesian classifier. A Neural Network (NN) as a learning machine is a modern computational method for collecting the required knowledge and predicting the output values in complicated systems. This paper presents a new approach for GPS spoofing detection based on multi-layer NN whose inputs are indices of features. Simulation results on a software GPS receiver showed adequate detection accuracy was obtained from NN with a short detection time.

Download Full-text