scholarly journals Anomaly detection and missing data imputation in building energy data for automated data pre-processing

2021 ◽  
Vol 2069 (1) ◽  
pp. 012144
Author(s):  
K Takahashi ◽  
R Ooka ◽  
S Ikeda

Abstract A new trend in building automation is the implementation of smart energy management systems to measure and control building systems without a need for decision-making by human operators. Artificial intelligence can optimize these systems by predicting future demand to make informed decisions about how to efficiently operate individual equipment. These machine learning algorithms use historical data to learn demand trends and require high quality datasets in order to make accurate predictions. But because of issues with data transmission or sensor errors, real world datasets often contain outliers or have data missing. In most research settings, these values can be simply omitted, but in practice, anomalies compromise the automation system’s prediction accuracy, rendering it unable to maximize energy savings. This study explores different machine learning algorithms for anomaly detection for automatically pre-processing incoming data using a case study on an actual electrical demand in a hospital building in Japan, namely cluster-based techniques such as k-means clustering and neural network-based approaches such as the autoencoder. Once anomalies were identified, the missing data was filled with prediction values from a deep neural network model. The newly composed data was then evaluated based on detection accuracy, prediction accuracy and training time. The proposed method of processing anomaly values allows the prediction model to process collected data without interruption, and shows similar predictive accuracy as manually processing the data. These predictions allow energy systems to optimize HVAC equipment control, increasing energy savings and reducing peak building loads.

2020 ◽  
Author(s):  
Amir Farzad ◽  
T. Aaron Gulliver

Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing improves anomaly detection accuracy.


2018 ◽  
Author(s):  
Nazmul Hossain ◽  
Fumihiko Yokota ◽  
Akira Fukuda ◽  
Ashir Ahmed

BACKGROUND Predictive analytics through machine learning has been extensively using across industries including eHealth and mHealth for analyzing patient’s health data, predicting diseases, enhancing the productivity of technology or devices used for providing healthcare services and so on. However, not enough studies were conducted to predict the usage of eHealth by rural patients in developing countries. OBJECTIVE The objective of this study is to predict rural patients’ use of eHealth through supervised machine learning algorithms and propose the best-fitted model after evaluating their performances in terms of predictive accuracy. METHODS Data were collected between June and July 2016 through a field survey with structured questionnaire form 292 randomly selected rural patients in a remote North-Western sub-district of Bangladesh. Four supervised machine learning algorithms namely logistic regression, boosted decision tree, support vector machine, and artificial neural network were chosen for this experiment. A ‘correlation-based feature selection’ technique was applied to include the most relevant but not redundant features into the model. A 10-fold cross-validation technique was applied to reduce bias and over-fitting of the data. RESULTS Logistic regression outperformed other three algorithms with 85.9% predictive accuracy, 86.4% precision, 90.5% recall, 88.1% F-score, and AUC of 91.5% followed by neural network, decision tree and support vector machine with the accuracy rate of 84.2%, 82.9 %, and 80.4% respectively. CONCLUSIONS The findings of this study are expected to be helpful for eHealth practitioners in selecting appropriate areas to serve and dealing with both under-capacity and over-capacity by predicting the patients’ response in advance with a certain level of accuracy and precision.


2021 ◽  
Author(s):  
Jinwoo Lee ◽  
Minsu Kwon ◽  
Youngjun Hong

Abstract In the oil and gas exploration process, understanding the hydrocarbon distribution of a reservoir is important. Well-log and core sample data such as porosity and water saturation are widely used for this purpose. With porosity and water saturation, we can calculate hydrocarbon volume more accurately than using well-log solely. However, as obtaining core sample data is expensive and time-consuming, predicting it with well-log can be a valuable solution for early-stage exploration since acquiring well-log is relatively economic and swift. Recently, numerous studies applied machine learning algorithms to predict core data from well-log. To the best of our knowledge, most works provide point estimation without probabilistic distribution modeling. In this paper, we developed a probabilistic deep neural network to provide uncertainty via confidence interval. Besides, we employed normalizing flows and multi-task learning to improve prediction accuracy. With this approach, we present the model's uncertainty that can be reliable information for decision making. Furthermore, we demonstrate our model outperforms other supervised machine learning algorithms regards to prediction accuracy.


Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


2017 ◽  
Vol 71 (1) ◽  
pp. 169-188 ◽  
Author(s):  
E. Shafiee ◽  
M. R. Mosavi ◽  
M. Moazedi

The importance of the Global Positioning System (GPS) and related electronic systems continues to increase in a range of environmental, engineering and navigation applications. However, civilian GPS signals are vulnerable to Radio Frequency (RF) interference. Spoofing is an intentional intervention that aims to force a GPS receiver to acquire and track invalid navigation data. Analysis of spoofing and authentic signal patterns represents the differences as phase, energy and imaginary components of the signal. In this paper, early-late phase, delta, and signal level as the three main features are extracted from the correlation output of the tracking loop. Using these features, spoofing detection can be performed by exploiting conventional machine learning algorithms such as K-Nearest Neighbourhood (KNN) and naive Bayesian classifier. A Neural Network (NN) as a learning machine is a modern computational method for collecting the required knowledge and predicting the output values in complicated systems. This paper presents a new approach for GPS spoofing detection based on multi-layer NN whose inputs are indices of features. Simulation results on a software GPS receiver showed adequate detection accuracy was obtained from NN with a short detection time.


Sign in / Sign up

Export Citation Format

Share Document