Linear Time Complexity Time Series Clustering with Symbolic Pattern Forest

With increasing powering of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires the data mining methods to have low time complexity to handle the huge and fast-changing data. This paper presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 85 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method is faster, and achieves better accuracy compared with other rival methods.

Download Full-text

Adaptive Multiresolution and Dedicated Elastic Matching in Linear Time Complexity for Time Series Data Mining

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.84 ◽

2006 ◽

Cited By ~ 3

Author(s):

Pierre-francois Marteau ◽

Gildas Menier

Keyword(s):

Data Mining ◽

Time Series ◽

Time Complexity ◽

Time Series Data ◽

Linear Time ◽

Series Data ◽

Time Series Data Mining

Download Full-text

Clustering Methodology for Time Series Mining

Scientific Journal of Riga Technical University Computer Sciences ◽

10.2478/v10143-010-0011-0 ◽

2009 ◽

Vol 40 (1) ◽

pp. 81-86

Author(s):

Pēteris Grabusts ◽

Arkady Borisov

Keyword(s):

Time Series ◽

Time Series Analysis ◽

Clustering Algorithm ◽

Time Series Data ◽

Similarity Measures ◽

Longest Common Subsequence ◽

Series Data ◽

Time Series Clustering ◽

Series Analysis ◽

Time Series Mining

Clustering Methodology for Time Series MiningA time series is a sequence of real data, representing the measurements of a real variable at time intervals. Time series analysis is a sufficiently well-known task; however, in recent years research has been carried out with the purpose to try to use clustering for the intentions of time series analysis. The main motivation for representing a time series in the form of clusters is to better represent the main characteristics of the data. The central goal of the present research paper was to investigate clustering methodology for time series data mining, to explore the facilities of time series similarity measures and to use them in the analysis of time series clustering results. More complicated similarity measures include Longest Common Subsequence method (LCSS). In this paper, two tasks have been completed. The first task was to define time series similarity measures. It has been established that LCSS method gives better results in the detection of time series similarity than the Euclidean distance. The second task was to explore the facilities of the classical k-means clustering algorithm in time series clustering. As a result of the experiment a conclusion has been drawn that the results of time series clustering with the help of k-means algorithm correspond to the results obtained with LCSS method, thus the clustering results of the specific time series are adequate.

Download Full-text

A Time-Series Data Generation Method to Predict Remaining Useful Life

Processes ◽

10.3390/pr9071115 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1115

Author(s):

Gilseung Ahn ◽

Hyungseok Yun ◽

Sun Hur ◽

Si-Yeong Lim

Keyword(s):

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Series Data ◽

Generation Model ◽

Data Generation ◽

Training Time ◽

Symbolic Aggregate Approximation ◽

Useful Life ◽

Occurrence Patterns

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.

Download Full-text

Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data

Computers & Geosciences ◽

10.1016/j.cageo.2019.07.008 ◽

2019 ◽

Vol 133 ◽

pp. 104304 ◽

Cited By ~ 1

Author(s):

Helen Pinto ◽

Ian Gates ◽

Xin Wang

Keyword(s):

Time Series ◽

Clustering Algorithm ◽

Time Series Data ◽

Series Data

Download Full-text

A Clustering Algorithm for Time Series Data

2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06) ◽

10.1109/pdcat.2006.1 ◽

2006 ◽

Cited By ~ 4

Author(s):

Jian Yin ◽

Duanning Zhou ◽

Qiong-qiong Xie

Keyword(s):

Time Series ◽

Clustering Algorithm ◽

Time Series Data ◽

Series Data

Download Full-text

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/11430919_40 ◽

2005 ◽

pp. 333-342 ◽

Cited By ~ 9

Author(s):

Jessica Lin ◽

Michai Vlachos ◽

Eamonn Keogh ◽

Dimitrios Gunopulos ◽

Jianwei Liu ◽

...

Keyword(s):

Time Series ◽

Data Streams ◽

Clustering Algorithm ◽

Time Series Data ◽

Nearest Neighbors ◽

Series Data

Download Full-text

A Review of Subsequence Time Series Clustering

The Scientific World JOURNAL ◽

10.1155/2014/312521 ◽

2014 ◽

Vol 2014 ◽

pp. 1-19 ◽

Cited By ~ 20

Author(s):

Seyedjamal Zolhavarieh ◽

Saeed Aghabozorgi ◽

Ying Wah Teh

Keyword(s):

Time Series ◽

Pattern Recognition ◽

Time Series Data ◽

State Of The Art ◽

Dna Recognition ◽

Series Data ◽

Time Series Clustering ◽

Future Studies ◽

Literature Reviews ◽

Open Issue

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

Download Full-text

Time-series analysis of the risk factors for haemorrhagic fever with renal syndrome: comparison of statistical models

Epidemiology and Infection ◽

10.1017/s0950268806006649 ◽

2006 ◽

Vol 135 (2) ◽

pp. 245-252 ◽

Cited By ~ 11

Author(s):

W. HU ◽

K. MENGERSEN ◽

P. BI ◽

S. TONG

Keyword(s):

Time Series ◽

Water Level ◽

Crop Production ◽

Time Series Data ◽

Linear Time ◽

Additive Model ◽

Series Data ◽

Haemorrhagic Fever ◽

Independent Variables ◽

China Model

Three conventional regression models were compared using the time-series data of the occurrence of haemorrhagic fever with renal syndrome (HFRS) and several key climatic and occupational variables collected in low-lying land, Anhui Province, China. Model I was a linear time series with normally distributed residuals; model II was a generalized linear model with Poisson-distributed residuals and a log link; and model III was a generalized additive model with the same distributional features as model II. Model I was fitted using least squares whereas models II and III were fitted using maximum likelihood. The results show that the correlations between the HFRS incidence and the independent variables measured (i.e. difference in water level, autumn crop production and density of Apodemus agrarius) ranged from −0·40 to 0·89. The HFRS incidence was positively associated with density of A. agrarius and crop production, but was inversely associated with difference in water level. The residual analyses and the examination of the accuracy of the models indicate that model III may be the most suitable in the assessment of the relationship between the incidence of HFRS and the independent variables.

Download Full-text

A new model for learning-based forecasting procedure by combining k-means clustering and time series forecasting algorithms

PeerJ Computer Science ◽

10.7717/peerj-cs.534 ◽

2021 ◽

Vol 7 ◽

pp. e534

Author(s):

Kristoko Dwi Hartomo ◽

Yessica Nataliani

Keyword(s):

Time Series ◽

Clustering Algorithm ◽

Time Series Data ◽

Mean Squared Error ◽

Time Series Forecasting ◽

Series Data ◽

Improvement Rate ◽

New Model ◽

Average Improvement ◽

Proposed Model

This paper aims to propose a new model for time series forecasting that combines forecasting with clustering algorithm. It introduces a new scheme to improve the forecasting results by grouping the time series data using k-means clustering algorithm. It utilizes the clustering result to get the forecasting data. There are usually some user-defined parameters affecting the forecasting results, therefore, a learning-based procedure is proposed to estimate the parameters that will be used for forecasting. This parameter value is computed in the algorithm simultaneously. The result of the experiment compared to other forecasting algorithms demonstrates good results for the proposed model. It has the smallest mean squared error of 13,007.91 and the average improvement rate of 19.83%.

Download Full-text

Hybrid Models for Adaptive Allocation of Electricity for Households

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1029.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 369-376

Keyword(s):

Time Series ◽

Hybrid Model ◽

Clustering Algorithm ◽

Time Series Data ◽

Active Power ◽

Sensor Data ◽

Series Data ◽

Adaptive Allocation ◽

Forecasting Methods ◽

Better Than

In this paper, we analyze, model, predict and cluster Global Active Power, i.e., a time series data obtained at one minute intervals from electricity sensors of a household. We analyze changes in seasonality and trends to model the data. We then compare various forecasting methods such as SARIMA and LSTM to forecast sensor data for the household and combine them to achieve a hybrid model that captures nonlinear variations better than either SARIMA or LSTM used in isolation. Finally, we cluster slices of time series data effectively using a novel clustering algorithm that is a combination of density-based and centroid-based approaches, to discover relevant subtle clusters from sensor data. Our experiments have yielded meaningful insights from the data at both a micro, day-to-day granularity, as well as a macro, weekly to monthly granularity.

Download Full-text