Parameterless Semi-supervised Anomaly Detection in Univariate Time Series

Author(s):  
Oleg Iegorov ◽  
Sebastian Fischmeister
2020 ◽  
Author(s):  
İsmail Sezen ◽  
Alper Unal ◽  
Ali Deniz

<p>Atmospheric pollution is one of the primary problems and high concentration levels are critical for human health and environment. This requires to study causes of unusual high concentration levels which do not conform to the expected behavior of the pollutant but it is not always easy to decide which levels are unusual, especially, when data is big and has complex structure. A visual inspection is subjective in most cases and a proper anomaly detection method should be used. Anomaly detection has been widely used in diverse research areas, but most of them have been developed for certain application domains. It also might not be always a good idea to identify anomalies by using data from near measurement sites because of spatio-temporal complexity of the pollutant. That’s why, it’s required to use a method which estimates anomalies from univariate time series data.</p><p>This work suggests a framework based on STL decomposition and extended isolation forest (EIF), which is a machine learning algorithm, to identify anomalies for univariate time series which has trend, multi-seasonality and seasonal variation. Main advantage of EIF method is that it defines anomalies by a score value.</p><p>In this study, a multi-seasonal STL decomposition has been applied on a univariate PM10 time series to remove trend and seasonal parts but STL is not resourceful to remove seasonal variation from the data. The remainder part still has 24 hours and yearly variation. To remove the variation, hourly and annual inter-quartile ranges (IQR) are calculated and data is standardized by dividing each value to corresponding IQR value. This process ensures removing seasonality in variation and the resulting data is processed by EIF to decide which values are anomaly by an objective criterion.</p>


2021 ◽  
Vol 11 (15) ◽  
pp. 6698
Author(s):  
Jehn-Ruey Jiang ◽  
Jian-Bin Kao ◽  
Yu-Lin Li

Thanks to the advance of novel technologies, such as sensors and Internet of Things (IoT) technologies, big amounts of data are continuously gathered over time, resulting in a variety of time series. A semi-supervised anomaly detection framework, called Tri-CAD, for univariate time series is proposed in this paper. Based on the Pearson product-moment correlation coefficient and Dickey–Fuller test, time series are first categorized into three classes: (i) periodic, (ii) stationary, and (iii) non-periodic and non-stationary time series. Afterwards, different mechanisms using statistics, wavelet transform, and deep learning autoencoder concepts are applied to different classes of time series for detecting anomalies. The performance of the proposed Tri-CAD framework is evaluated by experiments using three Numenta anomaly benchmark (NAB) datasets. The performance of Tri-CAD is compared with those of related methods, such as STL, SARIMA, LSTM, LSTM with STL, and ADSaS. The comparison results show that Tri-CAD outperforms the others in terms of the precision, recall, and F1-score.


Author(s):  
Cynthia Freeman ◽  
Ian Beaver ◽  
Abdullah Mueen

The existence of a time series anomaly detection method that performs well for all domains is a myth. Given a massive library of available methods, how can one select the best method for their application? An extensive evaluation of every anomaly detection method is not feasible. Many existing anomaly detection systems do not include an avenue for human feedback, essential given the subjective nature of what even is anomalous. We present a technique for improving univariate time series anomaly detection through automatic algorithm selection and human-in-the-loop false-positive removement. These determinations were made by extensively experimenting with over 30 pre-annotated time series from the open-source Numenta Anomaly Benchmark repository. Once the highest performing anomaly detection methods are selected via these characteristics, humans can annotate the predicted outliers which are used to tune anomaly scores via subsequence similarity search and improve the selected methods for their data, increasing evaluation scores and reducing the need for annotation by 70% on predicted anomalies where annotation is used to improve F-scores.


2016 ◽  
Vol 136 (3) ◽  
pp. 363-372
Author(s):  
Takaaki Nakamura ◽  
Makoto Imamura ◽  
Masashi Tatedoko ◽  
Norio Hirai

2020 ◽  
Vol 5 (1) ◽  
pp. 374
Author(s):  
Pauline Jin Wee Mah ◽  
Nur Nadhirah Nanyan

The main purpose of this study is to compare the performances of univariate and bivariate models on four time series variables of the crude palm oil industry in Peninsular Malaysia. The monthly data for the four variables, which are the crude palm oil production, price, import and export, were obtained from Malaysian Palm Oil Board (MPOB) and Malaysian Palm Oil Council (MPOC). In the first part of this study, univariate time series models, namely, the autoregressive integrated moving average (ARIMA), fractionally integrated autoregressive moving average (ARFIMA) and autoregressive autoregressive (ARAR) algorithm were used for modelling and forecasting purposes. Subsequently, the dependence between any two of the four variables were checked using the residuals’ sample cross correlation functions before modelling the bivariate time series. In order to model the bivariate time series and make prediction, the transfer function models were used. The forecast accuracy criteria used to evaluate the performances of the models were the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The results of the univariate time series showed that the best model for predicting the production was ARIMA  while the ARAR algorithm were the best forecast models for predicting both the import and export of crude palm oil. However, ARIMA  appeared to be the best forecast model for price based on the MAE and MAPE values while ARFIMA  emerged the best model based on the RMSE value.  When considering bivariate time series models, the production was dependent on import while the export was dependent on either price or import. The results showed that the bivariate models had better performance compared to the univariate models for production and export of crude palm oil based on the forecast accuracy criteria used.


Sign in / Sign up

Export Citation Format

Share Document