Time Series Classification by Shapelet Dictionary Learning with SVM-Based Ensemble Classifier

Time series classification is a basic and important approach for time series data mining. Nowadays, more researchers pay attention to the shape similarity method including Shapelet-based algorithms because it can extract discriminative subsequences from time series. However, most Shapelet-based algorithms discover Shapelets by searching candidate subsequences in training datasets, which brings two drawbacks: high computational burden and poor generalization ability. To overcome these drawbacks, this paper proposes a novel algorithm named Shapelet Dictionary Learning with SVM-based Ensemble Classifier (SDL-SEC). SDL-SEC modifies the Shapelet algorithm from two aspects: Shapelet discovery method and classifier. Firstly, a Shapelet Dictionary Learning (SDL) is proposed as a novel Shapelet discovery method to generate Shapelets instead of searching them. In this way, SDL owns the advantages of lower computational cost and higher generalization ability. Then, an SVM-based Ensemble Classifier (SEC) is developed as a novel ensemble classifier and adapted to the SDL algorithm. Different from the classic SVM that needs precise parameters tuning and appropriate features selection, SEC can avoid overfitting caused by a large number of features and parameters. Compared with the baselines on 45 datasets, the proposed SDL-SEC algorithm achieves a competitive classification accuracy with lower computational cost.

Download Full-text

Fuzzy Prediction Intervals Using Credibility Distributions

Engineering Proceedings ◽

10.3390/engproc2021005051 ◽

2021 ◽

Vol 5 (1) ◽

pp. 51

Author(s):

Enriqueta Vercher ◽

Abel Rubio ◽

José D. Bermúdez

Keyword(s):

Time Series ◽

Time Series Data ◽

Computational Cost ◽

Expected Value ◽

Prediction Intervals ◽

Series Data ◽

Comparative Results ◽

Automatic Forecasting

We present a new forecasting scheme based on the credibility distribution of fuzzy events. This approach allows us to build prediction intervals using the first differences of the time series data. Additionally, the credibility expected value enables us to estimate the k-step-ahead pointwise forecasts. We analyze the coverage of the prediction intervals and the accuracy of pointwise forecasts using different credibility approaches based on the upper differences. The comparative results were obtained working with yearly time series from the M4 Competition. The performance and computational cost of our proposal, compared with automatic forecasting procedures, are presented.

Download Full-text

Discriminate Supervised Weighted Scheme for the Classification of Time Series Signals

International Journal of Sociotechnology and Knowledge Development ◽

10.4018/ijskd.2021070101 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1-16

Author(s):

Elangovan Ramanujam ◽

S. Padmavathi

Keyword(s):

Time Series ◽

Time Series Data ◽

State Of The Art ◽

Statistical Significance ◽

Series Data ◽

Bag Of Words ◽

Time Series Classification ◽

Problem Of Time ◽

Weighted Matrix

Innovations and applicability of time series data mining techniques have significantly increased the researchers' interest in the problem of time series classification. Several algorithms have been proposed for this purpose categorized under shapelet, interval, motif, and whole series-based techniques. Among this, the bag-of-words technique, an extensive application of the text mining approach, performs well due to its simplicity and effectiveness. To extend the efficiency of the bag-of-words technique, this paper proposes a discriminate supervised weighted scheme to identify the characteristic and representative pattern of a class for efficient classification. This paper uses a modified weighted matrix that discriminates the representative and non-representative pattern which enables the interpretability in classification. Experimentation has been carried out to compare the performance of the proposed technique with state-of-the-art techniques in terms of accuracy and statistical significance.

Download Full-text

Co-eye: a multi-resolution ensemble classifier for symbolically approximated time series

Machine Learning ◽

10.1007/s10994-020-05887-3 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2029-2061

Author(s):

Zahraa S. Abdallah ◽

Mohamed Medhat Gaber

Keyword(s):

Time Series ◽

Time Series Data ◽

Ensemble Classifier ◽

Series Data ◽

New Classification ◽

Symbolic Representations ◽

Classification Technique ◽

Field Of Vision ◽

Main Challenge ◽

Wide Range

Abstract Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no “one model that fits all” in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through “compound eyes” that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e., forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.

Download Full-text

Edge4TSC: Binary Distribution Tree-Enabled Time Series Classification in Edge Environment

Sensors ◽

10.3390/s20071908 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1908

Author(s):

Chao Ma ◽

Xiaochuan Shi ◽

Wei Li ◽

Weiping Zhu

Keyword(s):

Time Series ◽

Deep Learning ◽

Classification Accuracy ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

Feature Engineering ◽

Time Series Classification ◽

Binary Distribution ◽

New Time

In the past decade, time series data have been generated from various fields at a rapid speed, which offers a huge opportunity for mining valuable knowledge. As a typical task of time series mining, Time Series Classification (TSC) has attracted lots of attention from both researchers and domain experts due to its broad applications ranging from human activity recognition to smart city governance. Specifically, there is an increasing requirement for performing classification tasks on diverse types of time series data in a timely manner without costly hand-crafting feature engineering. Therefore, in this paper, we propose a framework named Edge4TSC that allows time series to be processed in the edge environment, so that the classification results can be instantly returned to the end-users. Meanwhile, to get rid of the costly hand-crafting feature engineering process, deep learning techniques are applied for automatic feature extraction, which shows competitive or even superior performance compared to state-of-the-art TSC solutions. However, because time series presents complex patterns, even deep learning models are not capable of achieving satisfactory classification accuracy, which motivated us to explore new time series representation methods to help classifiers further improve the classification accuracy. In the proposed framework Edge4TSC, by building the binary distribution tree, a new time series representation method was designed for addressing the classification accuracy concern in TSC tasks. By conducting comprehensive experiments on six challenging time series datasets in the edge environment, the potential of the proposed framework for its generalization ability and classification accuracy improvement is firmly validated with a number of helpful insights.

Download Full-text

Time-Series Classification Based on Fusion Features of Sequence and Visualization

Applied Sciences ◽

10.3390/app10124124 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4124

Author(s):

Baoquan Wang ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Fan Zhao ◽

...

Keyword(s):

Time Series ◽

Human Brain ◽

Time Series Data ◽

Short Term Memory ◽

Computational Cost ◽

Open Data ◽

Attention Mechanism ◽

Series Data ◽

Sequence Features ◽

Fusion Features

For the task of time-series data classification (TSC), some methods directly classify raw time-series (TS) data. However, certain sequence features are not evident in the time domain and the human brain can extract visual features based on visualization to classify data. Therefore, some researchers have converted TS data to image data and used image processing methods for TSC. While human perceptionconsists of a combination of human senses from different aspects, existing methods only use sequence features or visualization features. Therefore, this paper proposes a framework for TSC based on fusion features (TSC-FF) of sequence features extracted from raw TS and visualization features extracted from Area Graphs converted from TS. Deep learning methods have been proven to be useful tools for automatically learning features from data; therefore, we use long short-term memory with an attention mechanism (LSTM-A) to learn sequence features and a convolutional neural network with an attention mechanism (CNN-A) for visualization features, in order to imitate the human brain. In addition, we use the simplest visualization method of Area Graph for visualization features extraction, avoiding loss of information and additional computational cost. This article aims to prove that using deep neural networks to learn features from different aspects and fusing them can replace complex, artificially constructed features, as well as remove the bias due to manually designed features, in order to avoid the limitations of domain knowledge. Experiments on several open data sets show that the framework achieves promising results, compared with other methods.

Download Full-text

A Metric Learning-Based Univariate Time Series Classification Method

Information ◽

10.3390/info11060288 ◽

2020 ◽

Vol 11 (6) ◽

pp. 288

Author(s):

Kuiyong Song ◽

Nianbin Wang ◽

Hongbin Wang

Keyword(s):

Time Series ◽

Time Series Data ◽

Multivariate Time Series ◽

Metric Learning ◽

Classification Method ◽

Series Data ◽

Classification Error ◽

Time Series Classification ◽

Classification Error Rate ◽

Univariate Time Series

High-dimensional time series classification is a serious problem. A similarity measure based on distance is one of the methods for time series classification. This paper proposes a metric learning-based univariate time series classification method (ML-UTSC), which uses a Mahalanobis matrix on metric learning to calculate the local distance between multivariate time series and combines Dynamic Time Warping(DTW) and the nearest neighbor classification to achieve the final classification. In this method, the features of the univariate time series are presented as multivariate time series data with a mean value, variance, and slope. Next, a three-dimensional Mahalanobis matrix is obtained based on metric learning in the data. The time series is divided into segments of equal intervals to enable the Mahalanobis matrix to more accurately describe the features of the time series data. Compared with the most effective measurement method, the related experimental results show that our proposed algorithm has a lower classification error rate in most of the test datasets.

Download Full-text

Time Series Classification with Discrete Wavelet Transformed Data

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016400088 ◽

2016 ◽

Vol 26 (09n10) ◽

pp. 1361-1377 ◽

Cited By ~ 4

Author(s):

Daoyuan Li ◽

Tegawende F. Bissyande ◽

Jacques Klein ◽

Yves Le Traon

Keyword(s):

Time Series ◽

Classification Accuracy ◽

Wavelet Transforms ◽

Time Series Data ◽

Knowledge Engineering ◽

Series Data ◽

Discrete Wavelet ◽

Time Series Classification ◽

Time Series Mining ◽

Compressed Data

Time series mining has become essential for extracting knowledge from the abundant data that flows out from many application domains. To overcome storage and processing challenges in time series mining, compression techniques are being used. In this paper, we investigate the loss/gain of performance of time series classification approaches when fed with lossy-compressed data. This extended empirical study is essential for reassuring practitioners, but also for providing more insights on how compression techniques can even be effective in smoothing and reducing noise in time series data. From a knowledge engineering perspective, we show that time series may be compressed by 90% using discrete wavelet transforms and still achieve remarkable classification accuracy, and that residual details left by popular wavelet compression techniques can sometimes even help to achieve higher classification accuracy than the raw time series data, as they better capture essential local features.

Download Full-text

Variance error of multi-classification based anomaly detection for time series data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-204699 ◽

2020 ◽

pp. 1-16

Author(s):

Baoquan Wang ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Fan Zhao ◽

...

Keyword(s):

Neural Network ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Computational Cost ◽

Reconstruction Error ◽

Detection Methods ◽

Series Data ◽

Data Set

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.

Download Full-text

Chebyshev Similarity Match between Uncertain Time Series

Mathematical Problems in Engineering ◽

10.1155/2015/105128 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Wei Wang ◽

Guohua Liu ◽

Dingjia Liu

Keyword(s):

Time Series ◽

Time Series Data ◽

Computational Cost ◽

Random Variable ◽

Repeated Measurements ◽

Series Data ◽

Chebyshev Inequality ◽

Central Tendency ◽

Uncertain Time ◽

Estimation Interval

In real application scenarios, the inherent impreciseness of sensor readings, the intentional perturbation of privacy-preserving transformations, and error-prone mining algorithms cause much uncertainty of time series data. The uncertainty brings serious challenges for the similarity measurement of time series. In this paper, we first propose a model of uncertain time series inspired by Chebyshev inequality. It estimates possible sample value range and central tendency range in terms of sample estimation interval and central tendency estimation interval, respectively, at each time slot. In comparison with traditional models adopting repeated measurements and random variable, Chebyshev model reduces overall computational cost and requires no prior knowledge. We convert Chebyshev uncertain time series into certain time series matrix; therefore noise reduction and dimensionality reduction are available for uncertain time series. Secondly, we propose a new similarity matching method based on Chebyshev model. It depends on overlaps between two sample estimation intervals and overlaps between central tendency estimation intervals from different uncertain time series. At the end of this paper, we conduct an extensive experiment and analyze the results by comparing with prior works.

Download Full-text

SRPM–CNN: a combined model based on slide relative position matrix and CNN for time series classification

Complex & Intelligent Systems ◽

10.1007/s40747-021-00296-y ◽

2021 ◽

Author(s):

Taoying Li ◽

Yuqi Zhang ◽

Ting Wang

Keyword(s):

Time Series ◽

Relative Position ◽

Time Series Data ◽

Series Data ◽

Time Series Classification ◽

Combined Model ◽

Daily Work ◽

Mining Areas ◽

Model Based ◽

Proposed Model

AbstractResearch on the time series classification is gaining an increased attention in the machine learning and data mining areas due to the existence of the time series data almost everywhere, especially in our daily work and life. Recent studies have shown that the convolutional neural networks (CNN) can extract good features from the images and texts, but it often encounters the problem of low accuracy, when it is directly employed to solve the problem of time series classification. In this pursuit, the present study envisaged a novel combined model based on the slide relative position matrix and CNN for time series. The proposed model first adopted the slide relative position for converting the time series data into 2D images during preprocessing, and then employed CNN to classify these images. This made the best of the temporal sequence characteristic of time series data, thereby utilizing the advantages of CNN in image recognition. Finally, 14 UCR time series datasets were chosen to evaluate the performance of the proposed model, whose results indicate that the accuracy of the proposed model was higher than others.

Download Full-text