scholarly journals Matrix Profile-Based Approach to Industrial Sensor Data Analysis Inside RDBMS

Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2146
Author(s):  
Mikhail Zymbler ◽  
Elena Ivanova

Currently, big sensor data arise in a wide spectrum of Industry 4.0, Internet of Things, and Smart City applications. In such subject domains, sensors tend to have a high frequency and produce massive time series in a relatively short time interval. The data collected from the sensors are subject to mining in order to make strategic decisions. In the article, we consider the problem of choosing a Time Series Database Management System (TSDBMS) to provide efficient storing and mining of big sensor data. We overview InfluxDB, OpenTSDB, and TimescaleDB, which are among the most popular state-of-the-art TSDBMSs, and represent different categories of such systems, namely native, add-ons over NoSQL systems, and add-ons over relational DBMSs (RDBMSs), respectively. Our overview shows that, at present, TSDBMSs offer a modest built-in toolset to mine big sensor data. This leads to the use of third-party mining systems and unwanted overhead costs due to exporting data outside a TSDBMS, data conversion, and so on. We propose an approach to managing and mining sensor data inside RDBMSs that exploits the Matrix Profile concept. A Matrix Profile is a data structure that annotates a time series through the index of and the distance to the nearest neighbor of each subsequence of the time series and serves as a basis to discover motifs, anomalies, and other time-series data mining primitives. This approach is implemented as a PostgreSQL extension that allows an application programmer both to compute matrix profiles and mining primitives and to represent them as relational tables. Experimental case studies show that our approach surpasses the above-mentioned out-of-TSDBMS competitors in terms of performance since it assumes that sensor data are mined inside a TSDBMS at no significant overhead costs.

AI ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 48-70
Author(s):  
Wei Ming Tan ◽  
T. Hui Teo

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.


Water ◽  
2021 ◽  
Vol 13 (14) ◽  
pp. 1944
Author(s):  
Haitham H. Mahmoud ◽  
Wenyan Wu ◽  
Yonghao Wang

This work develops a toolbox called WDSchain on MATLAB that can simulate blockchain on water distribution systems (WDS). WDSchain can import data from Excel and EPANET water modelling software. It extends the EPANET to enable simulation blockchain of the hydraulic data at any intended nodes. Using WDSchain will strengthen network automation and the security in WDS. WDSchain can process time-series data with two simulation modes: (1) static blockchain, which takes a snapshot of one-time interval data of all nodes in WDS as input and output into chained blocks at a time, and (2) dynamic blockchain, which takes all simulated time-series data of all the nodes as input and establishes chained blocks at the simulated time. Five consensus mechanisms are developed in WDSchain to provide data at different security levels using PoW, PoT, PoV, PoA, and PoAuth. Five different sizes of WDS are simulated in WDSchain for performance evaluation. The results show that a trade-off is needed between the system complexity and security level for data validation. The WDSchain provides a methodology to further explore the data validation using Blockchain to WDS. The limitations of WDSchain do not consider selection of blockchain nodes and broadcasting delay compared to commercial blockchain platforms.


2020 ◽  
Vol 2020 (1) ◽  
pp. 98-117
Author(s):  
Jyoti U. Devkota

Abstract The nightfires illuminated on the earth surface are caught by the satellite. These are emitted by various sources such as gas flares, biomass burning, volcanoes, and industrial sites such as steel mills. Amount of nightfires in an area is a proxy indicator of fuel consumption and CO2 emission. In this paper the behavior of radiant heat (RH) data produced by nightfire is minutely analyzed over a period of 75 hour; the geographical coordinates of energy sources generating these values are not considered. Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS DNB) satellite earth observation nightfire data were used. These 75 hours and 28252 observations time series RH (unit W) data is from 2 September 2018 to 6 September 2018. The dynamics of change in the overall behavior these data and with respect to time and irrespective of its geographical occurrence is studied and presented here. Different statistical methodologies are also used to identify hidden groups and patterns which are not obvious by remote sensing. Underlying groups and clusters are formed using Cluster Analysis and Discriminant Analysis. The behavior of RH for three consecutive days is studied with the technique Analysis of Variance. Cubic Spline Interpolation and merging has been done to create a time series data occurring at equal minute time interval. The time series data is decomposed to study the effect of various components. The behavior of this data is also analyzed in frequency domain by study of period, amplitude and the spectrum.


Author(s):  
Meenakshi Narayan ◽  
Ann Majewicz Fey

Abstract Sensor data predictions could significantly improve the accuracy and effectiveness of modern control systems; however, existing machine learning and advanced statistical techniques to forecast time series data require significant computational resources which is not ideal for real-time applications. In this paper, we propose a novel forecasting technique called Compact Form Dynamic Linearization Model-Free Prediction (CFDL-MFP) which is derived from the existing model-free adaptive control framework. This approach enables near real-time forecasts of seconds-worth of time-series data due to its basis as an optimal control problem. The performance of the CFDL-MFP algorithm was evaluated using four real datasets including: force sensor readings from surgical needle, ECG measurements for heart rate, and atmospheric temperature and Nile water level recordings. On average, the forecast accuracy of CFDL-MFP was 28% better than the benchmark Autoregressive Integrated Moving Average (ARIMA) algorithm. The maximum computation time of CFDL-MFP was 49.1ms which was 170 times faster than ARIMA. Forecasts were best for deterministic data patterns, such as the ECG data, with a minimum average root mean squared error of (0.2±0.2).


2022 ◽  
Vol 3 (1) ◽  
pp. 1-26
Author(s):  
Omid Hajihassani ◽  
Omid Ardakanian ◽  
Hamzeh Khazaei

The abundance of data collected by sensors in Internet of Things devices and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this article, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.


2019 ◽  
Author(s):  
Girish L

Network and Cloud Data Centers generate a lot of data every second, this data can be collected as a time series data. A time series is a sequence taken at successive equally spaced points in time, that means at a particular time interval to a specific time, the values of specific data that was taken is known as a data of a time series. This time series data can be collected using system metrics like CPU, Memory, and Disk utilization. The TICK Stack is an acronym for a platform of open source tools built to make collection, storage, graphing, and alerting on time series data incredibly easy. As a data collector, the authors are using both Telegraf and Collectd, for storing and analyzing data and the time series database InfluxDB. For plotting and visualizing, they use Chronograf along with Grafana. Kapacitor is used for alert refinement and once system metrics usage exceeds the specified threshold, the alert is generated and sends it to the system admin.


While analyzing iot projects it is very expensive to buy a lot of sensors , corresponding processor boards, power supplies etc. Moreover the entire process is to be replicated to cater to large topologies. The whole experiment is to be planned at a large scale before we can actually start to see analytics working. At a smaller scale this can be implemented as a simulation program in linux where the sensor data is created using a random number generator and scaled appropriately for each type of sensor to mimic representative data. This is them encrypted before sending it over the network to the edge nodes. At the server a socket stream now continuously awaits sensor data Here the required sensor data is retrieved and decrypted to give true time series data. This time series is now given to an analytics engine which calculates the trends and cyclicity and is used to train a neural network. The anomalies so found are properly deciphered. The multiplicity of the nodes can be characterized by having several client programs running in separate terminals. A simple client server architecture is thus able to simulate a large iot infrastructure and is able to perform analytics on a scaled model


Author(s):  
Roy Assaf ◽  
Anika Schumann

We demonstrate that CNN deep neural networks can not only be used for making predictions based on multivariate time series data, but also for explaining these predictions. This is important for a number of applications where predictions are the basis for decisions and actions. Hence, confidence in the prediction result is crucial. We design a two stage convolutional neural network architecture which uses particular kernel sizes. This allows us to utilise gradient based techniques for generating saliency maps for both the time dimension and the features. These are then used for explaining which features during which time interval are responsible for a given prediction, as well as explaining during which time intervals was the joint contribution of all features most important for that prediction. We demonstrate our approach for predicting the average energy production of photovoltaic power plants and for explaining these predictions.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Mahbubul Alam ◽  
Laleh Jalali ◽  
Mahbubul Alam ◽  
Ahmed Farahat ◽  
Chetan Gupta

Abstract—Prognostics aims to predict the degradation of equipment by estimating their remaining useful life (RUL) and/or the failure probability within a specific time horizon. The high demand of equipment prognostics in the industry have propelled researchers to develop robust and efficient prognostics techniques. Among data driven techniques for prognostics, machine learning and deep learning (DL) based techniques, particularly Recurrent Neural Networks (RNNs) have gained significant attention due to their ability of effectively representing the degradation progress by employing dynamic temporal behaviors. RNNs are well known for handling sequential data, especially continuous time series sequential data where the data follows certain pattern. Such data is usually obtained from sensors attached to the equipment. However, in many scenarios sensor data is not readily available and often very tedious to acquire. Conversely, event data is more common and can easily be obtained from the error logs saved by the equipment and transmitted to a backend for further processing. Nevertheless, performing prognostics using event data is substantially more difficult than that of the sensor data due to the unique nature of event data. Though event data is sequential, it differs from other seminal sequential data such as time series and natural language in the following manner, i) unlike time series data, events may appear at any time, i.e., the appearance of events lacks periodicity; ii) unlike natural languages, event data do not follow any specific linguistic rule. Additionally, there may be a significant variability in the event types appearing within the same sequence.  Therefore, this paper proposes an RUL estimation framework to effectively handle the intricate and novel event data. The proposed framework takes discrete events generated by an equipment (e.g., type, time, etc.) as input, and generates for each new event an estimate of the remaining operating cycles in the life of a given component. To evaluate the efficacy of our proposed method, we conduct extensive experiments using benchmark datasets such as the CMAPSS data after converting the time-series data in these datasets to sequential event data. The event data conversion is carried out by careful exploration and application of appropriate transformation techniques to the time series. To the best of our knowledge this is the first time such event-based RUL estimation problem is introduced to the community. Furthermore, we propose several deep learning and machine learning based solution for the event-based RUL estimation problem. Our results suggest that the deep learning models, 1D-CNN, LSTM, and multi-head attention show similar RMSE, MAE and Score performance. Foreseeably, the XGBoost model achieve lower performance compared to the deep learning models since the XGBoost model fails to capture ordering information from the sequence of events. 


In this paper, we analyze, model, predict and cluster Global Active Power, i.e., a time series data obtained at one minute intervals from electricity sensors of a household. We analyze changes in seasonality and trends to model the data. We then compare various forecasting methods such as SARIMA and LSTM to forecast sensor data for the household and combine them to achieve a hybrid model that captures nonlinear variations better than either SARIMA or LSTM used in isolation. Finally, we cluster slices of time series data effectively using a novel clustering algorithm that is a combination of density-based and centroid-based approaches, to discover relevant subtle clusters from sensor data. Our experiments have yielded meaningful insights from the data at both a micro, day-to-day granularity, as well as a macro, weekly to monthly granularity.


Sign in / Sign up

Export Citation Format

Share Document