Transitional SAX Representation for Knowledge Discovery for Time Series

Numerous dimensionality-reducing representations of time series have been proposed in data mining and have proved to be useful, especially in handling a high volume of time series data. Among them, widely used symbolic representations such as symbolic aggregate approximation and piecewise aggregate approximation focus on information of local averages of time series. To compensate for such methods, several attempts were made to include trend information. However, the included trend information is quite simple, leading to great information loss. Such information is hardly extendable, so adjusting the level of simplicity to a higher complexity is difficult. In this paper, we propose a new symbolic representation method called transitional symbolic aggregate approximation that incorporates transitional information into symbolic aggregate approximations. We show that the proposed method, satisfying a lower bound of the Euclidean distance, is able to preserve meaningful information, including dynamic trend transitions in segmented time series, while still reducing dimensionality. We also show that this method is advantageous from theoretical aspects of interpretability, and practical and superior in terms of time-series classification tasks when compared with existing symbolic representation methods.

Download Full-text

A Boundary Distance-Based Symbolic Aggregate Approximation Method for Time Series Data

Algorithms ◽

10.3390/a13110284 ◽

2020 ◽

Vol 13 (11) ◽

pp. 284

Author(s):

Zhenwen He ◽

Shirong Long ◽

Xiaogang Ma ◽

Hong Zhao

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

Financial Investment ◽

Symbolic Aggregate Approximation ◽

Wide Range ◽

Weighted Boundary ◽

Representation Method ◽

Boundary Distance

A large amount of time series data is being generated every day in a wide range of sensor application domains. The symbolic aggregate approximation (SAX) is a well-known time series representation method, which has a lower bound to Euclidean distance and may discretize continuous time series. SAX has been widely used for applications in various domains, such as mobile data management, financial investment, and shape discovery. However, the SAX representation has a limitation: Symbols are mapped from the average values of segments, but SAX does not consider the boundary distance in the segments. Different segments with similar average values may be mapped to the same symbols, and the SAX distance between them is 0. In this paper, we propose a novel representation named SAX-BD (boundary distance) by integrating the SAX distance with a weighted boundary distance. The experimental results show that SAX-BD significantly outperforms the SAX representation, ESAX representation, and SAX-TD representation.

Download Full-text

A Time-Series Data Generation Method to Predict Remaining Useful Life

Processes ◽

10.3390/pr9071115 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1115

Author(s):

Gilseung Ahn ◽

Hyungseok Yun ◽

Sun Hur ◽

Si-Yeong Lim

Keyword(s):

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Series Data ◽

Generation Model ◽

Data Generation ◽

Training Time ◽

Symbolic Aggregate Approximation ◽

Useful Life ◽

Occurrence Patterns

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.

Download Full-text

Co-eye: a multi-resolution ensemble classifier for symbolically approximated time series

Machine Learning ◽

10.1007/s10994-020-05887-3 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2029-2061

Author(s):

Zahraa S. Abdallah ◽

Mohamed Medhat Gaber

Keyword(s):

Time Series ◽

Time Series Data ◽

Ensemble Classifier ◽

Series Data ◽

New Classification ◽

Symbolic Representations ◽

Classification Technique ◽

Field Of Vision ◽

Main Challenge ◽

Wide Range

Abstract Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no “one model that fits all” in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through “compound eyes” that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e., forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.

Download Full-text

Automatic Crop Classification in Northeastern China by Improved Nonlinear Dimensionality Reduction for Satellite Image Time Series

Remote Sensing ◽

10.3390/rs12172726 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2726 ◽

Cited By ~ 1

Author(s):

Yongguang Zhai ◽

Nan Wang ◽

Lifu Zhang ◽

Lei Hao ◽

Caihong Hao

Keyword(s):

Time Series ◽

Dimensionality Reduction ◽

Time Series Data ◽

Satellite Image ◽

Series Data ◽

Nonlinear Dimensionality Reduction ◽

Distribution Map ◽

Dimensionality Reduction Technique ◽

Classification Tasks ◽

Crop Classification

Accurate and timely information on the spatial distribution of crops is of great significance to precision agriculture and food security. Many cropland mapping methods using satellite image time series are based on expert knowledge to extract phenological features to identify crops. It is still a challenge to automatically obtain meaningful features from time-series data for crop classification. In this study, we developed an automated method based on satellite image time series to map the spatial distribution of three major crops including maize, rice, and soybean in northeastern China. The core method used is the nonlinear dimensionality reduction technique. However, the existing nonlinear dimensionality reduction technique cannot handle missing data, and it is not designed for subsequent classification tasks. Therefore, the nonlinear dimensionality reduction algorithm Landmark–Isometric feature mapping (L–ISOMAP) is improved. The advantage of the improved L–ISOMAP is that it does not need to reconstruct time series for missing data, and it can automatically obtain meaningful featured metrics for classification. The improved L–ISOMAP was applied to Landsat 8 full-band time-series data during the crop-growing season in the three northeastern provinces of China; then, the dimensionality reduction bands were inputted into a random forest classifier to complete a crop distribution map. The results show that the area of crops mapped is consistent with official statistics. The 2015 crop distribution map was evaluated through the collected reference dataset, and the overall classification accuracy and Kappa index were 83.68% and 0.7519, respectively. The geographical characteristics of major crops in three provinces in northeast China were analyzed. This study demonstrated that the improved L–ISOMAP method can be used to automatically extract features for crop classification. For future work, there is great potential for applying automatic mapping algorithms to other data or classification tasks.

Download Full-text

Hexadecimal Aggregate Approximation Representation and Classification of Time Series Data

Algorithms ◽

10.3390/a14120353 ◽

2021 ◽

Vol 14 (12) ◽

pp. 353

Author(s):

Zhenwen He ◽

Chunfeng Zhang ◽

Xiaogang Ma ◽

Gang Liu

Keyword(s):

Time Series ◽

Classification Accuracy ◽

Euclidean Distance ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

General Representation ◽

Symbolic Aggregate Approximation ◽

Space Cost

Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.

Download Full-text

An enhanced binary symbolic representation for time series data mining based similarity

2008 7th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2008.4594024 ◽

2008 ◽

Author(s):

Meiyu Sun ◽

Jianan Fang

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Symbolic Representation ◽

Series Data ◽

Time Series Data Mining

Download Full-text

A piecewise linear representation method based on importance data points for time series data

2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD) ◽

10.1109/cscwd.2016.7565973 ◽

2016 ◽

Cited By ~ 8

Author(s):

Cun Ji ◽

Shijun Liu ◽

Chenglei Yang ◽

Lei Wu ◽

Li Pan ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Piecewise Linear ◽

Linear Representation ◽

Series Data ◽

Piecewise Linear Representation ◽

Data Points ◽

Representation Method

Download Full-text

A Haar Wavelet-based Multi-resolution Representation Method of Time Series Data

Proceedings of the International Conference on Agents and Artificial Intelligence ◽

10.5220/0005307006200626 ◽

2015 ◽

Cited By ~ 1

Author(s):

Muhammad Marwan Muhammad Fuad

Keyword(s):

Time Series ◽

Time Series Data ◽

Haar Wavelet ◽

Series Data ◽

Representation Method

Download Full-text

Abridged Symbolic Representation of Time Series for Clustering

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.341.03 ◽

2019 ◽

Vol 2 (341) ◽

pp. 43-50

Author(s):

Jerzy Korzeniewski

Keyword(s):

Time Series ◽

Time Series Data ◽

Symbolic Representation ◽

Series Data ◽

Approximation Technique ◽

Original Form ◽

Data Sets ◽

Time Series Clustering ◽

User Interference

In recent years a couple of methods aimed at time series symbolic representation have been introduced or developed. This activity is mainly justified by practical considerations such memory savings or fast data base searching. However, some results suggest that in the subject of time series clustering symbolic representation can even upgrade the results of clustering. The article contains a proposal of a new algorithm directed at the task of time series abridged symbolic representation with the emphasis on efficient time series clustering. The idea of the proposal is based on the PAA (piecewise aggregate approximation) technique followed by segmentwise correlation analysis. The primary goal of the article is to upgrade the quality of the PAA technique with respect to possible time series clustering (its speed and quality). We also tried to answer the following questions. Is the task of time series clustering in their original form reasonable? How much memory can we save using the new algorithm? The efficiency of the new algorithm was investigated on empirical time series data sets. The results prove that the new proposal is quite effective with a very limited amount of parametric user interference needed.

Download Full-text

An Investigation of the Structural Characteristics of the Indian IT Sector and the Capital Goods Sector – An Application of the R Programming in Time Series Decomposition and Forecasting

10.36227/techrxiv.16640227.v1 ◽

2021 ◽

Author(s):

Jaydip Sen ◽

Tamal Datta Chaudhuri

Keyword(s):

Time Series ◽

Exchange Rate ◽

Stock Market ◽

Time Series Data ◽

Structural Characteristics ◽

Statistical Tests ◽

Strong Association ◽

High Volume ◽

Series Data ◽

Capital Goods

<p>Time series analysis and forecasting of stock market prices has been a very active area of research over the last two decades. Availability of extremely fast and parallel architecture of computing and sophisticated algorithms has made it possible to extract, store, process and analyze high volume stock market time series data very efficiently. In this paper, we have used time series data of the two sectors of the Indian economy – Information Technology (IT) and Capital Goods (CG) for the period January 2009 – April 2016 and have studied the relationships of these two time series with the time series of DJIA indices, NIFTY indices and the US Dollar to Indian Rupees exchange rate. We established by graphical and statistical tests that while the IT sector of India has a strong association with DJIA indices and the Dollar to Rupee exchange rate, the Indian CG sector exhibits a strong association with the NIFTY indices. We contend that these observations corroborate our hypotheses that the Indian IT sector is strongly coupled with the world economy whereas the CG sector of India is the reflection of India’s internal economic growth. We also present several models of regression between the time series which exhibit strong association among them. The effectiveness of these models have been demonstrated by very low values of their forecasting errors. </p>

Download Full-text