symbolic aggregate approximation
Recently Published Documents


TOTAL DOCUMENTS

67
(FIVE YEARS 34)

H-INDEX

7
(FIVE YEARS 2)

Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 353
Author(s):  
Zhenwen He ◽  
Chunfeng Zhang ◽  
Xiaogang Ma ◽  
Gang Liu

Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.


2021 ◽  
Vol 11 (23) ◽  
pp. 11294
Author(s):  
Zuo-Cheng Wen ◽  
Zhi-Heng Zhang ◽  
Xiang-Bing Zhou ◽  
Jian-Gang Gu ◽  
Shao-Peng Shen ◽  
...  

Recently, predicting multivariate time-series (MTS) has attracted much attention to obtain richer semantics with similar or better performances. In this paper, we propose a tri-partition alphabet-based state (tri-state) prediction method for symbolic MTSs. First, for each variable, the set of all symbols, i.e., alphabets, is divided into strong, medium, and weak using two user-specified thresholds. With the tri-partitioned alphabet, the tri-state takes the form of a matrix. One order contains the whole variables. The other is a feature vector that includes the most likely occurring strong, medium, and weak symbols. Second, a tri-partition strategy based on the deviation degree is proposed. We introduce the piecewise and symbolic aggregate approximation techniques to polymerize and discretize the original MTS. This way, the symbol is stronger and has a bigger deviation. Moreover, most popular numerical or symbolic similarity or distance metrics can be combined. Third, we propose an along–across similarity model to obtain the k-nearest matrix neighbors. This model considers the associations among the time stamps and variables simultaneously. Fourth, we design two post-filling strategies to obtain a completed tri-state. The experimental results from the four-domain datasets show that (1) the tri-state has greater recall but lower precision; (2) the two post-filling strategies can slightly improve the recall; and (3) the along–across similarity model composed by the Triangle and Jaccard metrics are first recommended for new datasets.


Author(s):  
Hainan Huang ◽  
Rongjie Zhang ◽  
Chengguang Xie ◽  
Xiaofeng Li

Various social events, such as holidays, important sporting events, and major celebrations, may result in sudden large-scale passenger flows in certain sections and stations of urban rail transit systems. The sudden inbound passenger flows caused by these events can easily lead to continuous congestion of the subway network, which has a profound impact on the safety, reliability, and stability of a subway system. Because of the large magnitude of swipe data and the high dimensionality of time series, it is difficult to identify the emergence of such large passenger flows. Additionally, the recognition accuracy of the existing identification methods cannot meet the operational monitoring requirements. To address the above-mentioned issues, this paper proposes an optimized symbolic aggregate approximation (SAX) algorithm to identify historical sudden passenger flows caused by large-scale events around subways. Specifically, pre-set cluster types and dynamic time warping (DTW) are proposed to enhance the matching rate. Compared with the K-means method, the proposed method exhibits an average increase of 30% in mining accuracy, and the calculation time is shortened to one-sixteenth of the original value.


Author(s):  
Lars Kegel ◽  
Claudio Hartmann ◽  
Maik Thiele ◽  
Wolfgang Lehner

AbstractProcessing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a native data type. A core access primitive of time series is matching, which requires efficient algorithms on-top of appropriate representations like the symbolic aggregate approximation (SAX) representing the current state of the art. This technique reduces a time series to a low-dimensional space by segmenting it and discretizing each segment into a small symbolic alphabet. Unfortunately, SAX ignores the deterministic behavior of time series such as cyclical repeating patterns or a trend component affecting all segments, which may lead to a sub-optimal representation accuracy. We therefore introduce a novel season- and a trend-aware symbolic approximation and demonstrate an improved representation accuracy without increasing the memory footprint. Most importantly, our techniques also enable a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.


Author(s):  
Yedukondala Rao Veeranki ◽  
Nagarajan Ganapathy ◽  
Ramakrishnan Swaminathan

Analysis of fluctuations in electrodermal activity (EDA) signals is widely preferred for emotion recognition. In this work, an attempt has been made to determine the patterns of fluctuations in EDA signals for various emotional states using improved symbolic aggregate approximation. For this, the EDA is obtained from a publicly available online database. The EDA is decomposed into phasic components and divided into equal segments. Each segment is transformed into a piecewise aggregate approximation (PAA). These approximations are discretized using 11 time-domain features to obtain symbolic sequences. Shannon entropy is extracted from each PAA-based symbolic sequence using varied symbol size [Formula: see text] and window length [Formula: see text]. Three machine-learning algorithms, namely Naive Bayes, support vector machine and rotation forest, are used for the classification. The results show that the proposed approach is able to determine the patterns of fluctuations for various emotional states in EDA signals. PAA features, namely maximum amplitude and chaos, significantly identify the subtle fluctuations in EDA and transforms them in symbolic sequences. The optimal values of [Formula: see text] and [Formula: see text] yield the highest performance. The rotation forest is accurate (F-[Formula: see text] and 60.02% for arousal and valence dimensions) in classifying various emotional states. The proposed approach can capture the patterns of fluctuations for varied-length signals. Particularly, the support vector machine yields the highest performance for a lower length of signals. Thus, it appears that the proposed method might be utilized to analyze various emotional states in both normal and clinical settings.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Gongliang Li ◽  
Mingyong Yin ◽  
Siyuan Jing ◽  
Bing Guo

Detection of abnormal network traffic is an important issue when builds intrusion detection systems. An effective way to address this issue is time series mining, in which the network traffic is naturally represented as a set of time series. In this paper, we propose a novel efficient algorithm, called RSFID (Random Shapelet Forest for Intrusion Detection), to detect abnormal traffic flow patterns in periodic network packets. Firstly, the Fast Correlation-based Filter (FCBF) algorithm is employed to remove irrelevant features to decrease the overfitting as well as the time complexity. Then, a random forest which is built upon a set of shapelet candidates is used to classify the normal and abnormal traffic flow patterns. Specifically, the Symbolic Aggregate approXimation (SAX) and random sampling technique are adopted to mitigate the high time complexity caused by enumerating shapelet candidates. Experimental results show the effectiveness and efficiency of the proposed algorithm.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1115
Author(s):  
Gilseung Ahn ◽  
Hyungseok Yun ◽  
Sun Hur ◽  
Si-Yeong Lim

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.


Sign in / Sign up

Export Citation Format

Share Document