Improving discretization based pattern discovery for multivariate time series by additional preprocessing

2021 ◽  
Vol 25 (5) ◽  
pp. 1051-1072
Author(s):  
Fabian Kai-Dietrich Noering ◽  
Konstantin Jonas ◽  
Frank Klawonn

In technical systems the analysis of similar load situations is a promising technique to gain information about the system’s state, its health or wearing. Very often, load situations are challenging to be defined by hand. Hence, these situations need to be discovered as recurrent patterns within multivariate time series data of the system under consideration. Unsupervised algorithms for finding such recurrent patterns in multivariate time series must be able to cope with very large data sets because the system might be observed over a very long time. In our previous work we identified discretization-based approaches to be very interesting for variable length pattern discovery because of their low computing time due to the simplification (symbolization) of the time series. In this paper we propose additional preprocessing steps for symbolic representation of time series aiming for enhanced multivariate pattern discovery. Beyond that we show the performance (quality and computing time) of our algorithms in a synthetic test data set as well as in a real life example with 100 millions of time points. We also test our approach with increasing dimensionality of the time series.

2021 ◽  
pp. 1-20
Author(s):  
Fabian Kai-Dietrich Noering ◽  
Yannik Schroeder ◽  
Konstantin Jonas ◽  
Frank Klawonn

In technical systems the analysis of similar situations is a promising technique to gain information about the system’s state, its health or wearing. Very often, situations cannot be defined but need to be discovered as recurrent patterns within time series data of the system under consideration. This paper addresses the assessment of different approaches to discover frequent variable-length patterns in time series. Because of the success of artificial neural networks (NN) in various research fields, a special issue of this work is the applicability of NNs to the problem of pattern discovery in time series. Therefore we applied and adapted a Convolutional Autoencoder and compared it to classical nonlearning approaches based on Dynamic Time Warping, based on time series discretization as well as based on the Matrix Profile. These nonlearning approaches have also been adapted, to fulfill our requirements like the discovery of potentially time scaled patterns from noisy time series. We showed the performance (quality, computing time, effort of parametrization) of those approaches in an extensive test with synthetic data sets. Additionally the transferability to other data sets is tested by using real life vehicle data. We demonstrated the ability of Convolutional Autoencoders to discover patterns in an unsupervised way. Furthermore the tests showed, that the Autoencoder is able to discover patterns with a similar quality like classical nonlearning approaches.


2020 ◽  
Vol 6 (2) ◽  
pp. 195
Author(s):  
Hasrun Afandi Umpusinga ◽  
Atika Riasari ◽  
Fajrin Satria Dwi Kesumah

Indonesia is one of largest users of sharia-based compliant recently which bring into many concerns how the sharia stocks listing in the most valuable sharia stocks index in Indonesia perform and correlate with other variables, particularly exchange rates. The study aims to analysis the causal relationship and to forecast the performances of sharia-based stocks and its Islamic index in Indonesia along with the volatility of exchange rate. Vector Autoregressive (VAR) model is applied as the method to analyse the multivariate time series as it is believed as the suitable model in predicting such time-series data in the scope of multivariate variables. The finding suggests VAR(1) model is the fitted model as such to both analyse its dynamic relationship and forecast the data set for the next 24 weeks. While the prediction shows the JII has an increasing data, both ANTM and EXR are predicted to have a stable volatility. In addition, granger causality defines variables to have effect in its respective variables, and IRF describes the shocks in one variable cause another variable is relatively difficult in reaching its zero condition in short-term period.


2019 ◽  
Vol 14 ◽  
pp. 155892501988346 ◽  
Author(s):  
Mine Seçkin ◽  
Ahmet Çağdaş Seçkin ◽  
Aysun Coşkun

Although textile production is heavily automation-based, it is viewed as a virgin area with regard to Industry 4.0. When the developments are integrated into the textile sector, efficiency is expected to increase. When data mining and machine learning studies are examined in textile sector, it is seen that there is a lack of data sharing related to production process in enterprises because of commercial concerns and confidentiality. In this study, a method is presented about how to simulate a production process and how to make regression from the time series data with machine learning. The simulation has been prepared for the annual production plan, and the corresponding faults based on the information received from textile glove enterprise and production data have been obtained. Data set has been applied to various machine learning methods within the scope of supervised learning to compare the learning performances. The errors that occur in the production process have been created using random parameters in the simulation. In order to verify the hypothesis that the errors may be forecast, various machine learning algorithms have been trained using data set in the form of time series. The variable showing the number of faulty products could be forecast very successfully. When forecasting the faulty product parameter, the random forest algorithm has demonstrated the highest success. As these error values have given high accuracy even in a simulation that works with uniformly distributed random parameters, highly accurate forecasts can be made in real-life applications as well.


Author(s):  
Jason Chen

Clustering analysis is a tool used widely in the Data Mining community and beyond (Everitt et al. 2001). In essence, the method allows us to “summarise” the information in a large data set X by creating a very much smaller set C of representative points (called centroids) and a membership map relating each point in X to its representative in C. An obvious but special type of data set that one might want to cluster is a time series data set. Such data has a temporal ordering on its elements, in contrast to non-time series data sets. In this article we explore the area of time series clustering, focusing mainly on a surprising recent result showing that the traditional method for time series clustering is meaningless. We then survey the literature of recent papers and go on to argue how time series clustering can be made meaningful.


Author(s):  
Hua Ling Deng ◽  
Yǔ Qiàn Sūn

The high volatility of world soybean prices has caused uncertainty and vulnerability particularly in the developing countries. The clustering of time series is a serviceable tool for discovering soybean price patterns in temporal data. However, traditional clustering method cannot represent the continuity of price data very well, nor keep a watchful eye on the correlation between factors. In this work, the authors use the Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data (TICC) to soybean price pattern discovery. This is a new method for multivariate time series clustering, which can simultaneously segment and cluster the time series data. Each pattern in the TICC method is defined by a Markov random field (MRF), characterizing the interdependencies between different factors of that pattern. Based on this representation, the characteristics of each pattern and the importance of each factor can be portrayed. The work provides a new way of thinking about market price prediction for agricultural products.


2020 ◽  
Vol 39 (5) ◽  
pp. 6419-6430
Author(s):  
Dusan Marcek

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.


AI ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 48-70
Author(s):  
Wei Ming Tan ◽  
T. Hui Teo

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.


Water ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 1633
Author(s):  
Elena-Simona Apostol ◽  
Ciprian-Octavian Truică ◽  
Florin Pop ◽  
Christian Esposito

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.


Sign in / Sign up

Export Citation Format

Share Document