Subsequence Time Series Clustering

Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959 ◽

2019 ◽

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text

A Tool to Explore Discrete-Time Data: The Time Series Response Analyser

International Journal of Sport Nutrition and Exercise Metabolism ◽

10.1123/ijsnem.2020-0150 ◽

2020 ◽

Vol 30 (5) ◽

pp. 374-381 ◽

Cited By ~ 2

Author(s):

Benjamin J. Narang ◽

Greg Atkinson ◽

Javier T. Gonzalez ◽

James A. Betts

Keyword(s):

Time Series ◽

Discrete Time ◽

Time Series Data ◽

Large Data ◽

Science Research ◽

Series Data ◽

Data Sets ◽

Time Data ◽

Short Commentary ◽

Computational Errors

The analysis of time series data is common in nutrition and metabolism research for quantifying the physiological responses to various stimuli. The reduction of many data from a time series into a summary statistic(s) can help quantify and communicate the overall response in a more straightforward way and in line with a specific hypothesis. Nevertheless, many summary statistics have been selected by various researchers, and some approaches are still complex. The time-intensive nature of such calculations can be a burden for especially large data sets and may, therefore, introduce computational errors, which are difficult to recognize and correct. In this short commentary, the authors introduce a newly developed tool that automates many of the processes commonly used by researchers for discrete time series analysis, with particular emphasis on how the tool may be implemented within nutrition and exercise science research.

Download Full-text

A Review on Anomaly Detection in Time Series

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/571032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1895-1900

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Monitoring Network ◽

Series Data ◽

Data Sets ◽

Data Set ◽

The Past ◽

Machine Health Monitoring ◽

Machine Health

Time series is a very common class of data sets. Among others, it is very simple to obtain time series data from a variety of various science and finance applications and an anomaly detection technique for time series is becoming a very prominent research topic nowadays. Anomaly identification covers intrusion detection, detection of theft, mistake detection, machine health monitoring, network sensor event detection or habitat disturbance. It is also used for removing suspicious data from the data set before production. This review aims to provide a detailed and organized overview of the Anomaly detection investigation. In this article we will first define what an anomaly in time series is, and then describe quickly some of the methods suggested in the past two or three years for detection of anomaly in time series

Download Full-text

Abridged Symbolic Representation of Time Series for Clustering

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.341.03 ◽

2019 ◽

Vol 2 (341) ◽

pp. 43-50

Author(s):

Jerzy Korzeniewski

Keyword(s):

Time Series ◽

Time Series Data ◽

Symbolic Representation ◽

Series Data ◽

Approximation Technique ◽

Original Form ◽

Data Sets ◽

Time Series Clustering ◽

User Interference

In recent years a couple of methods aimed at time series symbolic representation have been introduced or developed. This activity is mainly justified by practical considerations such memory savings or fast data base searching. However, some results suggest that in the subject of time series clustering symbolic representation can even upgrade the results of clustering. The article contains a proposal of a new algorithm directed at the task of time series abridged symbolic representation with the emphasis on efficient time series clustering. The idea of the proposal is based on the PAA (piecewise aggregate approximation) technique followed by segmentwise correlation analysis. The primary goal of the article is to upgrade the quality of the PAA technique with respect to possible time series clustering (its speed and quality). We also tried to answer the following questions. Is the task of time series clustering in their original form reasonable? How much memory can we save using the new algorithm? The efficiency of the new algorithm was investigated on empirical time series data sets. The results prove that the new proposal is quite effective with a very limited amount of parametric user interference needed.

Download Full-text

Improving discretization based pattern discovery for multivariate time series by additional preprocessing

Intelligent Data Analysis ◽

10.3233/ida-205329 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1051-1072

Author(s):

Fabian Kai-Dietrich Noering ◽

Konstantin Jonas ◽

Frank Klawonn

Keyword(s):

Time Series ◽

Time Series Data ◽

Multivariate Time Series ◽

Pattern Discovery ◽

Computing Time ◽

Real Life ◽

Large Data ◽

Series Data ◽

Data Set ◽

Recurrent Patterns

In technical systems the analysis of similar load situations is a promising technique to gain information about the system’s state, its health or wearing. Very often, load situations are challenging to be defined by hand. Hence, these situations need to be discovered as recurrent patterns within multivariate time series data of the system under consideration. Unsupervised algorithms for finding such recurrent patterns in multivariate time series must be able to cope with very large data sets because the system might be observed over a very long time. In our previous work we identified discretization-based approaches to be very interesting for variable length pattern discovery because of their low computing time due to the simplification (symbolization) of the time series. In this paper we propose additional preprocessing steps for symbolic representation of time series aiming for enhanced multivariate pattern discovery. Beyond that we show the performance (quality and computing time) of our algorithms in a synthetic test data set as well as in a real life example with 100 millions of time points. We also test our approach with increasing dimensionality of the time series.

Download Full-text

Case Studies in Multi-unit LongitudinalModels with Random Coefficients and Patterned Correlation Structure

Austrian Journal of Statistics ◽

10.17713/ajs.v29i2.503 ◽

2016 ◽

Vol 29 (2) ◽

pp. 93-110

Author(s):

Johannes Ledolter

Keyword(s):

Time Series ◽

Model Building ◽

Time Series Data ◽

Random Coefficients ◽

Correlation Structure ◽

Model Specification ◽

Series Data ◽

Data Sets ◽

Economic Time Series ◽

Data Set

Modelling issues in multi-unit longitudinal models with random coefficients and patterned correlation structure are illustrated in the context of three data sets. The first data set deals with short time series data on annual death rates and alcohol consumption of twenty-five European countries. The second data set deals with glaceologic time series data on snow temperature at 14 different locations within a small glacier in the Austrian Alps. The third data set consists of annual economic time series on factor productivity, anddomestic and foreign research/development (R&D) capital stocks. A practical model building approach–consisting of model specification, estimation, and diagnostic checking–is outlined in the context of these three data sets.

Download Full-text

The CH-IRP data set: a decade of fortnightly data on δ2H and δ18O in streamflow and precipitation in Switzerland

Earth System Science Data ◽

10.5194/essd-12-3057-2020 ◽

2020 ◽

Vol 12 (4) ◽

pp. 3057-3066

Author(s):

Maria Staudinger ◽

Stefan Seeger ◽

Barbara Herbstritt ◽

Michael Stoelzle ◽

Jan Seibert ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Isotope Composition ◽

Series Data ◽

Data Repository ◽

Data Sets ◽

Individual Study ◽

Data Set ◽

Flow Pathways

Abstract. The stable isotopes of oxygen and hydrogen, 18O and 2H, provide information on water flow pathways and hydrologic catchment functioning. Here a data set of time series data on precipitation and streamflow isotope composition in medium-sized Swiss catchments, CH-IRP, is presented that is unique in terms of its long-term multi-catchment coverage along an alpine to pre-alpine gradient. The data set comprises fortnightly time series of both δ2H and δ18O as well as deuterium excess from streamflow for 23 sites in Switzerland, together with summary statistics of the sampling at each station. Furthermore, time series of δ18O and δ2H in precipitation are provided for each catchment derived from interpolated data sets from the ISOT, GNIP and ANIP networks. For each station we compiled relevant metadata describing both the sampling conditions and catchment characteristics and climate information. Lab standards and errors are provided, and potentially problematic measurements are indicated to help the user decide on the applicability for individual study purposes. For the future, the measurements are planned to be continued at 14 stations as a long-term isotopic measurement network, and the CH-IRP data set will, thus, continuously be extended. The data set can be downloaded from data repository Zenodo at https://doi.org/10.5281/zenodo.4057967 (Staudinger et al., 2020).

Download Full-text

Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

Subsequence Time Series Clustering

Time series event correlation with DTW and Hierarchical Clustering methods

A Tool to Explore Discrete-Time Data: The Time Series Response Analyser

A Review on Anomaly Detection in Time Series

Abridged Symbolic Representation of Time Series for Clustering

Improving discretization based pattern discovery for multivariate time series by additional preprocessing

Case Studies in Multi-unit LongitudinalModels with Random Coefficients and Patterned Correlation Structure

The CH-IRP data set: a decade of fortnightly data on <i>δ</i><sup>2</sup>H and <i>δ</i><sup>18</sup>O in streamflow and precipitation in Switzerland

Time series event correlation with DTW and Hierarchical Clustering methods

Some statistical and CI models to predict chaotic high-frequency financial data

Remaining Useful Life Prediction Using Temporal Convolution with Attention

Export Citation Format