scholarly journals RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series

Author(s):  
Qingsong Wen ◽  
Jingkun Gao ◽  
Xiaomin Song ◽  
Liang Sun ◽  
Huan Xu ◽  
...  

Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in real-world data which are not addressed properly, including 1) ability to handle seasonality fluctuation and shift, and abrupt change in trend and reminder; 2) robustness on data with anomalies; 3) applicability on time series with long seasonality period. In the paper, we propose a novel and generic time series decomposition algorithm to address these challenges. Specifically, we extract the trend component robustly by solving a regression problem using the least absolute deviations loss with sparse regularization. Based on the extracted trend, we apply the the non-local seasonal filtering to extract the seasonality component. This process is repeated until accurate decomposition is obtained. Experiments on different synthetic and real-world time series datasets demonstrate that our method outperforms existing solutions.

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 969
Author(s):  
Miguel C. Soriano ◽  
Luciano Zunino

Time-delayed interactions naturally appear in a multitude of real-world systems due to the finite propagation speed of physical quantities. Often, the time scales of the interactions are unknown to an external observer and need to be inferred from time series of observed data. We explore, in this work, the properties of several ordinal-based quantifiers for the identification of time-delays from time series. To that end, we generate artificial time series of stochastic and deterministic time-delay models. We find that the presence of a nonlinearity in the generating model has consequences for the distribution of ordinal patterns and, consequently, on the delay-identification qualities of the quantifiers. Here, we put forward a novel ordinal-based quantifier that is particularly sensitive to nonlinearities in the generating model and compare it with previously-defined quantifiers. We conclude from our analysis on artificially generated data that the proper identification of the presence of a time-delay and its precise value from time series benefits from the complementary use of ordinal-based quantifiers and the standard autocorrelation function. We further validate these tools with a practical example on real-world data originating from the North Atlantic Oscillation weather phenomenon.


2018 ◽  
Vol 24 (3) ◽  
pp. 984-1003 ◽  
Author(s):  
Aistis RAUDYS ◽  
Židrina PABARŠKAITĖ

Smoothing time series allows removing noise. Moving averages are used in finance to smooth stock price series and forecast trend direction. We propose optimised custom moving average that is the most suitable for stock time series smoothing. Suitability criteria are defined by smoothness and accuracy. Previous research focused only on one of the two criteria in isolation. We define this as multi-criteria Pareto optimisation problem and compare the proposed method to the five most popular moving average methods on synthetic and real world stock data. The comparison was performed using unseen data. The new method outperforms other methods in 99.5% of cases on synthetic and in 91% on real world data. The method allows better time series smoothing with the same level of accuracy as traditional methods, or better accuracy with the same smoothness. Weights optimised on one stock are very similar to weights optimised for any other stock and can be used interchangeably. Traders can use the new method to detect trends earlier and increase the profitability of their strategies. The concept is also applicable to sensors, weather forecasting, and traffic prediction where both the smoothness and accuracy of the filtered signal are important.


2015 ◽  
Vol 26 ◽  
pp. vii99 ◽  
Author(s):  
Yu Uneno ◽  
Kei Taneishi ◽  
Masashi Kanai ◽  
Akiko Tamon ◽  
Kazuya Okamoto ◽  
...  

2010 ◽  
Vol 12 (3) ◽  
pp. 318-328 ◽  
Author(s):  
Abdullah Gedikli ◽  
Hafzullah Aksoy ◽  
N. Erdem Unal

In this study, three algorithms are presented for time series segmentation. The first algorithm is based on the branch-and-bound approach, the second on the dynamic programming while the third is a modified version of the latter into which the remaining cost concept of the former is introduced. A user-friendly computer program called AUG-Segmenter is developed. Segmentation-by-constant and segmentation-by-linear-regression can be performed by the program. The program is tested on real-world time series of thousands of terms and found useful in performing segmentation satisfactorily and fast.


2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Prasanta Pal ◽  
Shataneek Banerjee ◽  
Amardip Ghosh ◽  
David R. Vago ◽  
Judson Brewer

<div> <div> <div> <p>Knowingly or unknowingly, digital-data is an integral part of our day-to-day lives. Realistically, there is probably not a single day when we do not encounter some form of digital-data. Typically, data originates from diverse sources in various formats out of which time-series is a special kind of data that captures the information about the time-evolution of a system under observation. How- ever, capturing the temporal-information in the context of data-analysis is a highly non-trivial challenge. Discrete Fourier-Transform is one of the most widely used methods that capture the very essence of time-series data. While this nearly 200-year-old mathematical transform, survived the test of time, however, the nature of real-world data sources violates some of the intrinsic properties presumed to be present to be able to be processed by DFT. Adhoc noise and outliers fundamentally alter the true signature of the frequency domain behavior of the signal of interest and as a result, the frequency-domain representation gets corrupted as well. We demonstrate that the application of traditional digital filters as is, may not often reveal an accurate description of the pristine time-series characteristics of the system under study. In this work, we analyze the issues of DFT with real-world data as well as propose a method to address it by taking advantage of insights from modern data-science techniques and particularly our previous work SOCKS. Our results reveal that a dramatic, never-before-seen improvement is possible by re-imagining DFT in the context of real-world data with appropriate curation protocols. We argue that our proposed transformation DFT21 would revolutionize the digital world in terms of accuracy, reliability, and information retrievability from raw-data. </p> </div> </div> </div>


2005 ◽  
Vol 2005 (2) ◽  
pp. 111-117
Author(s):  
Juan R. Sánchez

The multiscale behavior of a recently reported model for stock markets is presented. It has been shown that indexes of real-world markets display absolute returns with memory properties on a long-time range, a phenomenon known as cluster volatility. The multiscale characteristics of an index are studied by analyzing the power-law scaling of the volatility correlations which display nonunique scaling exponents. Here such analysis is done on an artificial time series produced by a simple model for stock markets. After comparison, excellent agreements with the multiscale behavior of real-time series are found.


Author(s):  
Amir Hossein Adineh ◽  
Zahra Narimani ◽  
Suresh Chandra Satapathy

Over last decades, time series data analysis has been in practice of specific importance. Different domains such as financial data analysis, analyzing biological data and speech recognition inherently deal with time dependent signals. Monitoring the past behavior of signals is a key for precise predicting the behavior of a system in near future. In scenarios such as financial data prediction, the predominant signal has a periodic behavior (starting from beginning of the month, week, etc.) and a general trend and seasonal behavior can also be assumed. Autoregressive Integrated Moving Average (ARIMA) model and its seasonal extension, SARIMA, have been widely used in forecasting time-series data, and are also capable of dealing with the seasonal behavior/trend in the data. Although the behavior of data may be autoregressive and trends and seasonality can be detected and handled by SARIMA, the data is not always exactly compatible with SARIMA (or more generally ARIMA) assumptions. In addition, the existence of missing data is not pre-assumed in SARIMA, while in real-world, there can be always missing data for different reasons such as holidays for which no data may be recorded. For different week days, different working hours may be a cause of observing irregular patterns compared to what is expected by SARIMA assumptions. In this paper, we investigate the effectiveness of applying SARIMA on such real-world data, and demonstrate preprocessing methods that can be applied in order to make the data more suitable to be modeled by SARIMA model. The data in the existing research is derived from transactions of a mutual fund investment company, which contains missing values (single point and intervals) and also irregularities as a result of the number of working hours per week days being different from each other which makes the data inconsistent leading to poor result without preprocessing. In addition, the number of data points was not adequate at the time of analysis in order to fit a SARIM model. Preprocessing steps such as filling missing values and tricks to make data consistent has been proposed to deal with existing problems. Results show that prediction performance of SARIMA on this set of real-world data is significantly improved by applying several preprocessing steps introduced in order to deal with mentioned circumstances. The proposed preprocessing steps can be used in other real-world time-series data analysis.


Author(s):  
Akane Iseki ◽  
Yusuke Mukuta ◽  
Yoshitaka Ushiku ◽  
Tatsuya Harada

Many real-world systems involve interacting time series. The ability to detect causal dependencies between system components from observed time series of their outputs is essential for understanding system behavior. The quantification of causal influences between time series is based on the definition of some causality measure. Partial Canonical Correlation Analysis (Partial CCA) and its extensions are examples of methods used for robustly estimating the causal relationships between two multidimensional time series even when the time series are short. These methods assume that the input data are complete and have no missing values. However, real-world data often contain missing values. It is therefore crucial to estimate the causality measure robustly even when the input time series is incomplete. Treating this problem as a semi-supervised learning problem, we propose a novel semi-supervised extension of probabilistic Partial CCA called semi-Bayesian Partial CCA. Our method exploits the information in samples with missing values to prevent the overfitting of parameter estimation even when there are few complete samples. Experiments based on synthesized and real data demonstrate the ability of the proposed method to estimate causal relationships more correctly than existing methods when the data contain missing values, the dimensionality is large, and the number of samples is small.


Sign in / Sign up

Export Citation Format

Share Document