scholarly journals Detecting Pattern Anomalies in Hydrological Time Series with Weighted Probabilistic Suffix Trees

Water ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 1464
Author(s):  
Yufeng Yu ◽  
Dingsheng Wan ◽  
Qun Zhao ◽  
Huan Liu

Anomalous patterns are common phenomena in time series datasets. The presence of anomalous patterns in hydrological data may represent some anomalous hydrometeorological events that are significantly different from others and induce a bias in the decision-making process related to design, operation and management of water resources. Hence, it is necessary to extract those “anomalous” knowledge that can provide valuable and useful information for future hydrological analysis and forecasting from hydrological data. This paper focuses on the problem of detecting anomalous patterns from hydrological time series data, and proposes an effective and accurate anomalous pattern detection approach, TFSAX_wPST, which combines the advantages of the Trend Feature Symbolic Aggregate approximation (TFSAX) and weighted Probabilistic Suffix Tree (wPST). Experiments with different hydrological real-world time series are reported, and the results indicate that the proposed methods are fast and can correctly detect anomalous patterns for hydrological time series analysis, and thus promote the deep analysis and continuous utilization of hydrological time series data.

Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1115
Author(s):  
Gilseung Ahn ◽  
Hyungseok Yun ◽  
Sun Hur ◽  
Si-Yeong Lim

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.


2021 ◽  
Vol 17 (4) ◽  
pp. 306-320
Author(s):  
Rahmah Mohd Lokoman ◽  
Fadhilah Yusof ◽  
Nor Eliza Alias ◽  
Zulkifli Yusop

Copula model has applied in various hydrologic studies, however, most analyses conducted does not considering the non-stationary conditions that may exist in the time series. To investigate the dependence structure between two rainfall stations at Johor Bahru, two methods have been applied. The first method considers the non-stationary condition that exists in the data, while the second method assumes stationarity in the time series data.  Through goodness-off-fit (GOF) and simulation tests, performance of both methods are compared in this study. The results obtained in this study highlight the importance of considering non-stationarity conditions in the hydrological data.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Yufeng Yu ◽  
Yuelong Zhu ◽  
Shijin Li ◽  
Dingsheng Wan

In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction confidence interval (PCI), which can be calculated by the predicted value and confidence coefficient. The use ofPCIas threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis.


Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 353
Author(s):  
Zhenwen He ◽  
Chunfeng Zhang ◽  
Xiaogang Ma ◽  
Gang Liu

Time series data are widely found in finance, health, environmental, social, mobile and other fields. A large amount of time series data has been produced due to the general use of smartphones, various sensors, RFID and other internet devices. How a time series is represented is key to the efficient and effective storage and management of time series data, as well as being very important to time series classification. Two new time series representation methods, Hexadecimal Aggregate approXimation (HAX) and Point Aggregate approXimation (PAX), are proposed in this paper. The two methods represent each segment of a time series as a transformable interval object (TIO). Then, each TIO is mapped to a spatial point located on a two-dimensional plane. Finally, the HAX maps each point to a hexadecimal digit so that a time series is converted into a hex string. The experimental results show that HAX has higher classification accuracy than Symbolic Aggregate approXimation (SAX) but a lower one than some SAX variants (SAX-TD, SAX-BD). The HAX has the same space cost as SAX but is lower than these variants. The PAX has higher classification accuracy than HAX and is extremely close to the Euclidean distance (ED) measurement; however, the space cost of PAX is generally much lower than the space cost of ED. HAX and PAX are general representation methods that can also support geoscience time series clustering, indexing and query except for classification.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 284
Author(s):  
Zhenwen He ◽  
Shirong Long ◽  
Xiaogang Ma ◽  
Hong Zhao

A large amount of time series data is being generated every day in a wide range of sensor application domains. The symbolic aggregate approximation (SAX) is a well-known time series representation method, which has a lower bound to Euclidean distance and may discretize continuous time series. SAX has been widely used for applications in various domains, such as mobile data management, financial investment, and shape discovery. However, the SAX representation has a limitation: Symbols are mapped from the average values of segments, but SAX does not consider the boundary distance in the segments. Different segments with similar average values may be mapped to the same symbols, and the SAX distance between them is 0. In this paper, we propose a novel representation named SAX-BD (boundary distance) by integrating the SAX distance with a weighted boundary distance. The experimental results show that SAX-BD significantly outperforms the SAX representation, ESAX representation, and SAX-TD representation.


2005 ◽  
Vol 4 (2) ◽  
pp. 61-82 ◽  
Author(s):  
Jessica Lin ◽  
Eamonn Keogh ◽  
Stefano Lonardi

Data visualization techniques are very important for data analysis, since the human eye has been frequently advocated as the ultimate data-mining tool. However, there has been surprisingly little work on visualizing massive time series data sets. To this end, we developed VizTree, a time series pattern discovery and visualization system based on augmenting suffix trees. VizTree visually summarizes both the global and local structures of time series data at the same time. In addition, it provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content. VizTree works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with state-of-the-art batch algorithms on several real and synthetic data sets. Based on the tree structure, we further device a coefficient which measures the dissimilarity between any two time series. This coefficient is shown to be competitive with the well-known Euclidean distance.


2017 ◽  
Vol 4 (1) ◽  
pp. 27 ◽  
Author(s):  
Bhola NS Ghimire

<p class="Default">Time series data often arise when monitoring hydrological processes. Most of the hydrological data are time related and directly or indirectly their analysis related with time component. Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. Many methods and approaches for formulating time series forecasting models are available in literature. This study will give a brief overview of auto-regressive integrated moving average (ARIMA) process and its application to forecast the river discharges for a river. The developed ARIMA model is tested successfully for two hydrological stations for a river in US.</p><p><strong>Journal of Nepal Physical Society</strong><em><br /></em>Volume 4, Issue 1, February 2017, Page: 27-32</p>


Water ◽  
2018 ◽  
Vol 10 (12) ◽  
pp. 1712 ◽  
Author(s):  
Qun Zhao ◽  
Yuelong Zhu ◽  
Dingsheng Wan ◽  
Yufeng Yu ◽  
Xifeng Cheng

Ensuring the quality of hydrological data has become a key issue in the field of hydrology. Based on the characteristics of hydrological data, this paper proposes a data-driven quality control method for hydrological data. For continuous hydrological time series data, two combined forecasting models and one statistical control model are constructed from horizontal, vertical, and statistical perspectives and the three models provide three confidence intervals. Set the suspicious level based on the number of confidence intervals for data violations, control the data, and provide suggested values for suspicious and missing data. For the discrete hydrological data with large time-space difference, the similar weight topological map between the neighboring stations is established centering on the hydrological station under the test and it is adjusted continuously with the seasonal changes. Lastly, a spatial interpolation model is established to detect the data. The experimental results show that the quality control method proposed in this paper can effectively detect and control the data, find suspicious and erroneous data, and provide suggested values.


2020 ◽  
Vol 10 (19) ◽  
pp. 6980
Author(s):  
Kiburm Song ◽  
Minho Ryu ◽  
Kichun Lee

Numerous dimensionality-reducing representations of time series have been proposed in data mining and have proved to be useful, especially in handling a high volume of time series data. Among them, widely used symbolic representations such as symbolic aggregate approximation and piecewise aggregate approximation focus on information of local averages of time series. To compensate for such methods, several attempts were made to include trend information. However, the included trend information is quite simple, leading to great information loss. Such information is hardly extendable, so adjusting the level of simplicity to a higher complexity is difficult. In this paper, we propose a new symbolic representation method called transitional symbolic aggregate approximation that incorporates transitional information into symbolic aggregate approximations. We show that the proposed method, satisfying a lower bound of the Euclidean distance, is able to preserve meaningful information, including dynamic trend transitions in segmented time series, while still reducing dimensionality. We also show that this method is advantageous from theoretical aspects of interpretability, and practical and superior in terms of time-series classification tasks when compared with existing symbolic representation methods.


Sign in / Sign up

Export Citation Format

Share Document