Multi-Scale Shapelets Discovery for Time-Series Classification

2020 ◽  
Vol 19 (03) ◽  
pp. 721-739
Author(s):  
Borui Cai ◽  
Guangyan Huang ◽  
Yong Xiang ◽  
Maia Angelova ◽  
Limin Guo ◽  
...  

Shapelets are subsequences of time-series that represent local patterns and can improve the accuracy and the interpretability of time-series classification. The major task of time-series classification using shapelets is to discover high quality shapelets. However, this is challenging since local patterns may have various scales/lengths rather than a unified scale. In this paper, we resolve this problem by discovering shapelets with multiple scales. We propose a novel Multi-Scale Shapelet Discovery (MSSD) algorithm to discover expressive multi-scale shapelets by extending initial single-scale shapelets (i.e., shapelets with a unified scale). MSSD adopts a bi-directional extension process and is robust to extend single-shapelets obtained by different methods. A supervised shapelet quality measurement is further developed to qualify the extension of shapelets. Comprehensive experiments conducted on 25 UCR time-series datasets show that multi-scale shapelets discovered by MSSD improve classification accuracy by around 10% (in average), compared with single-scale shapelets discovered by counterpart methods.

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1908
Author(s):  
Chao Ma ◽  
Xiaochuan Shi ◽  
Wei Li ◽  
Weiping Zhu

In the past decade, time series data have been generated from various fields at a rapid speed, which offers a huge opportunity for mining valuable knowledge. As a typical task of time series mining, Time Series Classification (TSC) has attracted lots of attention from both researchers and domain experts due to its broad applications ranging from human activity recognition to smart city governance. Specifically, there is an increasing requirement for performing classification tasks on diverse types of time series data in a timely manner without costly hand-crafting feature engineering. Therefore, in this paper, we propose a framework named Edge4TSC that allows time series to be processed in the edge environment, so that the classification results can be instantly returned to the end-users. Meanwhile, to get rid of the costly hand-crafting feature engineering process, deep learning techniques are applied for automatic feature extraction, which shows competitive or even superior performance compared to state-of-the-art TSC solutions. However, because time series presents complex patterns, even deep learning models are not capable of achieving satisfactory classification accuracy, which motivated us to explore new time series representation methods to help classifiers further improve the classification accuracy. In the proposed framework Edge4TSC, by building the binary distribution tree, a new time series representation method was designed for addressing the classification accuracy concern in TSC tasks. By conducting comprehensive experiments on six challenging time series datasets in the edge environment, the potential of the proposed framework for its generalization ability and classification accuracy improvement is firmly validated with a number of helpful insights.


Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 1013 ◽  
Author(s):  
David Cuesta-Frau ◽  
Antonio Molina-Picó ◽  
Borja Vargas ◽  
Paula González

Many measures to quantify the nonlinear dynamics of a time series are based on estimating the probability of certain features from their relative frequencies. Once a normalised histogram of events is computed, a single result is usually derived. This process can be broadly viewed as a nonlinear I R n mapping into I R , where n is the number of bins in the histogram. However, this mapping might entail a loss of information that could be critical for time series classification purposes. In this respect, the present study assessed such impact using permutation entropy (PE) and a diverse set of time series. We first devised a method of generating synthetic sequences of ordinal patterns using hidden Markov models. This way, it was possible to control the histogram distribution and quantify its influence on classification results. Next, real body temperature records are also used to illustrate the same phenomenon. The experiments results confirmed the improved classification accuracy achieved using raw histogram data instead of the PE final values. Thus, this study can provide a very valuable guidance for the improvement of the discriminating capability not only of PE, but of many similar histogram-based measures.


2017 ◽  
Vol 14 (2) ◽  
pp. 67-80 ◽  
Author(s):  
Cun Ji ◽  
Chao Zhao ◽  
Li Pan ◽  
Shijun Liu ◽  
Chenglei Yang ◽  
...  

Time series classification (TSC) has attracted significant interest over the past decade. A shapelet is one fragment of a time series that can represent class characteristics of the time series. A classifier based on shapelets is interpretable, more accurate, and faster. However, the time it takes to find shapelets is enormous. This article will propose a fast shapelet (FS) discovery algorithm based on important data points (IDPs). First, the algorithm will identify IDPs. Next, the subsequence containing one or more IDPs will be selected as a candidate shapelet. Finally, the best shapelets will be selected. Results will show that the proposed algorithm reduces the shapelet discovery time by approximately 14.0% while maintaining the same level of classification accuracy rates.


2016 ◽  
Vol 26 (09n10) ◽  
pp. 1361-1377 ◽  
Author(s):  
Daoyuan Li ◽  
Tegawende F. Bissyande ◽  
Jacques Klein ◽  
Yves Le Traon

Time series mining has become essential for extracting knowledge from the abundant data that flows out from many application domains. To overcome storage and processing challenges in time series mining, compression techniques are being used. In this paper, we investigate the loss/gain of performance of time series classification approaches when fed with lossy-compressed data. This extended empirical study is essential for reassuring practitioners, but also for providing more insights on how compression techniques can even be effective in smoothing and reducing noise in time series data. From a knowledge engineering perspective, we show that time series may be compressed by 90% using discrete wavelet transforms and still achieve remarkable classification accuracy, and that residual details left by popular wavelet compression techniques can sometimes even help to achieve higher classification accuracy than the raw time series data, as they better capture essential local features.


Author(s):  
Zipeng Chen ◽  
Qianli Ma ◽  
Zhenxi Lin

Multi-scale information is crucial for modeling time series. Although most existing methods consider multiple scales in the time-series data, they assume all kinds of scales are equally important for each sample, making them unable to capture the dynamic temporal patterns of time series. To this end, we propose Time-Aware Multi-Scale Recurrent Neural Networks (TAMS-RNNs), which disentangle representations of different scales and adaptively select the most important scale for each sample at each time step. First, the hidden state of the RNN is disentangled into multiple independently updated small hidden states, which use different update frequencies to model time-series multi-scale information. Then, at each time step, the temporal context information is used to modulate the features of different scales, selecting the most important time-series scale. Therefore, the proposed model can capture the multi-scale information for each time series at each time step adaptively. Extensive experiments demonstrate that the model outperforms state-of-the-art methods on multivariate time series classification and human motion prediction tasks. Furthermore, visualized analysis on music genre recognition verifies the effectiveness of the model.


2014 ◽  
Vol 580-583 ◽  
pp. 2853-2859
Author(s):  
Peng Li Li ◽  
Wei Ping Ti ◽  
Jia Chun Li

Due to the broadly application of remote sensing imagery, there is an eager need for the classification of objects in the images. The multi-scale classification based on object oriented analysis is not a usual approach for image classification because the users of multi-scale classification do not know how to use the information from multiple scales to do multi-scale classification. Many users rely on some easily accessible tools. nearest neighbour classifier, to do multi-scale classification. The multi-scale classification classifies the images from different scales. The feature values of the object vary from different scales and they may have some trends against scales. These trends may help us to understand multi-scale classification better. This is the scale dependency of features. The difference between multi-scale classification and single-scale classification is not only multiple scales, but also the use of information from different scales. In order to explore the connection between different scales, the research of new features is necessary.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chiranjibi Sitaula ◽  
Tej Bahadur Shahi ◽  
Sunil Aryal ◽  
Faezeh Marzbanrad

AbstractChest X-ray (CXR) images have been one of the important diagnosis tools used in the COVID-19 disease diagnosis. Deep learning (DL)-based methods have been used heavily to analyze these images. Compared to other DL-based methods, the bag of deep visual words-based method (BoDVW) proposed recently is shown to be a prominent representation of CXR images for their better discriminability. However, single-scale BoDVW features are insufficient to capture the detailed semantic information of the infected regions in the lungs as the resolution of such images varies in real application. In this paper, we propose a new multi-scale bag of deep visual words (MBoDVW) features, which exploits three different scales of the 4th pooling layer’s output feature map achieved from VGG-16 model. For MBoDVW-based features, we perform the Convolution with Max pooling operation over the 4th pooling layer using three different kernels: $$1 \times 1$$ 1 × 1 , $$2 \times 2$$ 2 × 2 , and $$3 \times 3$$ 3 × 3 . We evaluate our proposed features with the Support Vector Machine (SVM) classification algorithm on four CXR public datasets (CD1, CD2, CD3, and CD4) with over 5000 CXR images. Experimental results show that our method produces stable and prominent classification accuracy (84.37%, 88.88%, 90.29%, and 83.65% on CD1, CD2, CD3, and CD4, respectively).


2021 ◽  
Author(s):  
Junlu Wang ◽  
Su Li ◽  
Wanting Ji ◽  
Tian Jiang ◽  
Baoyan Song

Abstract Time series classification is a basic task in the field of streaming data event analysis and data mining. The existing time series classification methods have the problems of low classification accuracy and low efficiency. To solve these problems, this paper proposes a T-CNN time series classification method based on a Gram matrix. Specifically, we perform wavelet threshold denoising on time series to filter normal curve noise, and propose a lossless transformation method based on the Gram matrix, which converts the time series to the time domain image and retains all the information of events. Then, we propose an improved CNN time series classification method, which introduces the Toeplitz convolution kernel matrix into convolution layer calculation. Finally, we introduce a Triplet network to calculate the similarity between similar events and different classes of events, and optimize the squared loss function of CNN. The proposed T-CNN model can accelerate the convergence rate of gradient descent and improve classification accuracy. Experimental results show that, compared with the existing methods, our T-CNN time series classification method has great advantages in efficiency and accuracy.


Sign in / Sign up

Export Citation Format

Share Document