scholarly journals A Boundary Distance-Based Symbolic Aggregate Approximation Method for Time Series Data

Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 284
Author(s):  
Zhenwen He ◽  
Shirong Long ◽  
Xiaogang Ma ◽  
Hong Zhao

A large amount of time series data is being generated every day in a wide range of sensor application domains. The symbolic aggregate approximation (SAX) is a well-known time series representation method, which has a lower bound to Euclidean distance and may discretize continuous time series. SAX has been widely used for applications in various domains, such as mobile data management, financial investment, and shape discovery. However, the SAX representation has a limitation: Symbols are mapped from the average values of segments, but SAX does not consider the boundary distance in the segments. Different segments with similar average values may be mapped to the same symbols, and the SAX distance between them is 0. In this paper, we propose a novel representation named SAX-BD (boundary distance) by integrating the SAX distance with a weighted boundary distance. The experimental results show that SAX-BD significantly outperforms the SAX representation, ESAX representation, and SAX-TD representation.

Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 353
Author(s):  
Zhenwen He ◽  
Chunfeng Zhang ◽  
Xiaogang Ma ◽  
Gang Liu

Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.


2020 ◽  
Vol 10 (19) ◽  
pp. 6980
Author(s):  
Kiburm Song ◽  
Minho Ryu ◽  
Kichun Lee

Numerous dimensionality-reducing representations of time series have been proposed in data mining and have proved to be useful, especially in handling a high volume of time series data. Among them, widely used symbolic representations such as symbolic aggregate approximation and piecewise aggregate approximation focus on information of local averages of time series. To compensate for such methods, several attempts were made to include trend information. However, the included trend information is quite simple, leading to great information loss. Such information is hardly extendable, so adjusting the level of simplicity to a higher complexity is difficult. In this paper, we propose a new symbolic representation method called transitional symbolic aggregate approximation that incorporates transitional information into symbolic aggregate approximations. We show that the proposed method, satisfying a lower bound of the Euclidean distance, is able to preserve meaningful information, including dynamic trend transitions in segmented time series, while still reducing dimensionality. We also show that this method is advantageous from theoretical aspects of interpretability, and practical and superior in terms of time-series classification tasks when compared with existing symbolic representation methods.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1115
Author(s):  
Gilseung Ahn ◽  
Hyungseok Yun ◽  
Sun Hur ◽  
Si-Yeong Lim

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.


2020 ◽  
Vol 109 (11) ◽  
pp. 2029-2061
Author(s):  
Zahraa S. Abdallah ◽  
Mohamed Medhat Gaber

Abstract Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no “one model that fits all” in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through “compound eyes” that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e., forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1908
Author(s):  
Chao Ma ◽  
Xiaochuan Shi ◽  
Wei Li ◽  
Weiping Zhu

In the past decade, time series data have been generated from various fields at a rapid speed, which offers a huge opportunity for mining valuable knowledge. As a typical task of time series mining, Time Series Classification (TSC) has attracted lots of attention from both researchers and domain experts due to its broad applications ranging from human activity recognition to smart city governance. Specifically, there is an increasing requirement for performing classification tasks on diverse types of time series data in a timely manner without costly hand-crafting feature engineering. Therefore, in this paper, we propose a framework named Edge4TSC that allows time series to be processed in the edge environment, so that the classification results can be instantly returned to the end-users. Meanwhile, to get rid of the costly hand-crafting feature engineering process, deep learning techniques are applied for automatic feature extraction, which shows competitive or even superior performance compared to state-of-the-art TSC solutions. However, because time series presents complex patterns, even deep learning models are not capable of achieving satisfactory classification accuracy, which motivated us to explore new time series representation methods to help classifiers further improve the classification accuracy. In the proposed framework Edge4TSC, by building the binary distribution tree, a new time series representation method was designed for addressing the classification accuracy concern in TSC tasks. By conducting comprehensive experiments on six challenging time series datasets in the edge environment, the potential of the proposed framework for its generalization ability and classification accuracy improvement is firmly validated with a number of helpful insights.


2007 ◽  
Vol 23 (4) ◽  
pp. 227-237 ◽  
Author(s):  
Thomas Kubiak ◽  
Cornelia Jonas

Abstract. Patterns of psychological variables in time have been of interest to research from the beginning. This is particularly true for ambulatory monitoring research, where large (cross-sectional) time-series datasets are often the matter of investigation. Common methods for identifying cyclic variations include spectral analyses of time-series data or time-domain based strategies, which also allow for modeling cyclic components. Though the prerequisites of these sophisticated procedures, such as interval-scaled time-series variables, are seldom met, their usage is common. In contrast to the time-series approach, methods from a different field of statistics, directional or circular statistics, offer another opportunity for the detection of patterns in time, where fewer prerequisites have to be met. These approaches are commonly used in biology or geostatistics. They offer a wide range of analytical strategies to examine “circular data,” i.e., data where period of measurement is rotationally invariant (e.g., directions on the compass or daily hours ranging from 0 to 24, 24 being the same as 0). In psychology, however, circular statistics are hardly known at all. In the present paper, we intend to give a succinct introduction into the rationale of circular statistics and describe how this approach can be used for the detection of patterns in time, contrasting it with time-series analysis. We report data from a monitoring study, where mood and social interactions were assessed for 4 weeks in order to illustrate the use of circular statistics. Both the results of periodogram analyses and circular statistics-based results are reported. Advantages and possible pitfalls of the circular statistics approach are highlighted concluding that ambulatory assessment research can benefit from strategies borrowed from circular statistics.


Author(s):  
Trung Duy Pham ◽  
Dat Tran ◽  
Wanli Ma

In the biomedical and healthcare fields, the ownership protection of the outsourced data is becoming a challenging issue in sharing the data between data owners and data mining experts to extract hidden knowledge and patterns. Watermarking has been proved as a right-protection mechanism that provides detectable evidence for the legal ownership of a shared dataset, without compromising its usability under a wide range of data mining for digital data in different formats such as audio, video, image, relational database, text and software. Time series biomedical data such as Electroencephalography (EEG) or Electrocardiography (ECG) is valuable and costly in healthcare, which need to have owner protection when sharing or transmission in data mining application. However, this issue related to kind of data has only been investigated in little previous research as its characteristics and requirements. This paper proposes an optimized watermarking scheme to protect ownership for biomedical and healthcare systems in data mining. To achieve the highest possible robustness without losing watermark transparency, Particle Swarm Optimization (PSO) technique is used to optimize quantization steps to find a suitable one. Experimental results on EEG data show that the proposed scheme provides good imperceptibility and more robust against various signal processing techniques and common attacks such as noise addition, low-pass filtering, and re-sampling.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Jingpei Dan ◽  
Weiren Shi ◽  
Fangyan Dong ◽  
Kaoru Hirota

A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series; the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively. Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends, and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA is effective for high dimensional time series data mining.


2017 ◽  
Author(s):  
Solveig H. Winsvold ◽  
Andreas Kääb ◽  
Christopher Nuth ◽  
Liss M. Andreassen ◽  
Ward van Pelt ◽  
...  

Abstract. With dense SAR satellite data time-series it is possible to map surface and subsurface glacier properties that vary in time. On Sentinel-1A and Radarsat-2 backscatter images over mainland Norway and Svalbard, we have used descriptive methods for outlining the possibilities of using SAR time-series for mapping glaciers. We present five application scenarios, where the first shows potential for tracking transient snow lines with SAR backscatter time-series, and correlates with both optical satellite images (Sentinel-2A and Landsat 8) and equilibrium line altitudes derived from in situ surface mass balance data. In the second application scenario, time-series representation of glacier facies corresponding to SAR glacier zones shows potential for a more accurate delineation of the zones and how they change in time. The third application scenario investigates the firn evolution using dense SAR backscatter time-series together with a coupled energy balance and multi-layer firn model. We find strong correlation between backscatter signals with both the modeled firn air-content and modeled wetness in the firn. In the fourth application scenario, we highlight how winter rain events can be detected in SAR time-series, revealing important information about the area extent of internal accumulation. Finally, in the last application scenario, averaged summer SAR images were found to have potential in assisting the process of mapping glaciers outlines, especially in the presence of seasonal snow. Altogether we present examples of how to map glaciers and to further understand glaciological processes using the existing and future massive amount of multi-sensor time-series data. Our results reveal the potential of satellite imagery for automatically derived products as important input in modeling assessments and glacier change analysis.


2018 ◽  
Vol 12 (3) ◽  
pp. 867-890 ◽  
Author(s):  
Solveig H. Winsvold ◽  
Andreas Kääb ◽  
Christopher Nuth ◽  
Liss M. Andreassen ◽  
Ward J. J. van Pelt ◽  
...  

Abstract. With dense SAR satellite data time series it is possible to map surface and subsurface glacier properties that vary in time. On Sentinel-1A and RADARSAT-2 backscatter time series images over mainland Norway and Svalbard, we outline how to map glaciers using descriptive methods. We present five application scenarios. The first shows potential for tracking transient snow lines with SAR backscatter time series and correlates with both optical satellite images (Sentinel-2A and Landsat 8) and equilibrium line altitudes derived from in situ surface mass balance data. In the second application scenario, time series representation of glacier facies corresponding to SAR glacier zones shows potential for a more accurate delineation of the zones and how they change in time. The third application scenario investigates the firn evolution using dense SAR backscatter time series together with a coupled energy balance and multilayer firn model. We find strong correlation between backscatter signals with both the modeled firn air content and modeled wetness in the firn. In the fourth application scenario, we highlight how winter rain events can be detected in SAR time series, revealing important information about the area extent of internal accumulation. In the last application scenario, averaged summer SAR images were found to have potential in assisting the process of mapping glaciers outlines, especially in the presence of seasonal snow. Altogether we present examples of how to map glaciers and to further understand glaciological processes using the existing and future massive amount of multi-sensor time series data.


Sign in / Sign up

Export Citation Format

Share Document