Development of Time Series Disorder Detection Algorithms Based on Nonparametric Hypothesis Testing Criteria

The solution of the problem of detecting, in the online mode, a spontaneous change in the probabilistic characteristics (“disorder” or “breakdown”) of a time series is given. It is pointed out that there is a growing interest in the development of so-called nonparametric disorder detection methods, i.e., methods the application of which does not require the knowledge of the probability distribution function of the controlled process values. It is stated that the majority of the known versions of such methods are based on using a number of standard nonparametric criteria transformed for solving disorder detection problems. It is proposed to use the signs criterion, the series criterion, and the Ramachandran–Ranganathan criterion as a basis for construction of disorder detection algorithms. The methodical aspects of studying the statistical properties and efficiency of the disorder detection algorithms built on their basis are considered. The simulation method was used as a study tool. The plan of carrying out simulation experiments was developed separately for each of the proposed algorithms, taking into account their individual characteristics, but based on the general requirement of fully reproducing the monitoring algorithm performance dynamics under real conditions, when a disorder can appear at any time and there is a transient in the values of the decisive function. By using a simulation experiment for each of the algorithms under consideration, data on their statistical characteristics were obtained and systematized in a scope sufficient for synthesizing a monitoring procedure with the specified properties.

Download Full-text

Evaluation of break detection methods for snow data series

10.5194/ems2021-47 ◽

2021 ◽

Author(s):

Moritz Buchmann ◽

Michael Begert ◽

Stefan Brönnimann ◽

Gernot Resch ◽

Christoph Marty

Keyword(s):

Time Series ◽

Snow Depth ◽

Swiss Alps ◽

Detection Methods ◽

Data Series ◽

Detection Algorithms ◽

Climate Indicators ◽

Seasonal Analysis ◽

Almost All ◽

New Applications

Measurements of snow depth and snowfall can vary dramatically over small distances. However, it is not clear if this applies to all derived variables and is the same for all seasons. Almost all meteorological time series incorporate some sort of inhomogeneities. Complete metadata and existing &#8220;parallel&#8221; stations in close proximity are not always available. First, we analyse the impacts of local-scale variations based on a unique set of parallel manual snow measurements for the Swiss Alps consisting of 30 station pairs with up to 70 years of parallel data. Station pairs are mostly located in the same villages (or within 3km horizontal and 150m vertical distances).&#160; Seasonal analysis of derived snow climate indicators such as maximum seasonal snow depth, sum of new snow, or days with snow on the ground shows that largest differences occur in spring and the smallest ones are found in DJF and NDJFMA. Relative inter-pair differences (uncertainties) for days with snow on the ground (average snow depth) are below 15% for 90% (30%) . Second, in view of any homogenization efforts of snow data series, it is paramount to understand the impacts of inhomogeneities. Using state-of-the-art break detection algorithms, we strive to investigate which method works best for detecting breaks in snow data series. The results can then be used on time series with insufficient metadata or no neighbouring stations in order to include them in future homogenization processes. Furthermore, the knowledge about inhomogeneities and breakpoints paves the way for new applications such as the reliable combination of two parallel series into one single series.

Download Full-text

Experimental Comparison and Survey of Twelve Time Series Anomaly Detection Algorithms

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12698 ◽

2021 ◽

Vol 72 ◽

pp. 849-899

Author(s):

Cynthia Freeman ◽

Jonathan Merriman ◽

Ian Beaver ◽

Abdullah Mueen

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Detection Method ◽

Concept Drift ◽

Area Under The Curve ◽

Detection Methods ◽

Experimental Comparison ◽

Detection Algorithms ◽

Extensive Evaluation ◽

Online Setting

The existence of an anomaly detection method that is optimal for all domains is a myth. Thus, there exists a plethora of anomaly detection methods which increases every year for a wide variety of domains. But a strength can also be a weakness; given this massive library of methods, how can one select the best method for their application? Current literature is focused on creating new anomaly detection methods or large frameworks for experimenting with multiple methods at the same time. However, and especially as the literature continues to expand, an extensive evaluation of every anomaly detection method is simply not feasible. To reduce this evaluation burden, we present guidelines to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays such as seasonality, trend, level change concept drift, and missing time steps. We provide a comprehensive experimental validation and survey of twelve anomaly detection methods over different time series characteristics to form guidelines based on several metrics: the AUC (Area Under the Curve), windowed F-score, and Numenta Anomaly Benchmark (NAB) scoring model. Applying our methodologies can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods, especially in an online setting.

Download Full-text

Comparison of Novelty Detection Methods for Detection of Various Rotary Machinery Faults

Sensors ◽

10.3390/s21103536 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3536

Author(s):

Jakub Górski ◽

Adam Jabłoński ◽

Mateusz Heesch ◽

Michał Dziendzikowski ◽

Ziemowit Dworakowski

Keyword(s):

Novelty Detection ◽

Feature Space ◽

Detection Methods ◽

Data Sets ◽

Similar Distribution ◽

Detection Algorithms ◽

Rotary Machinery ◽

Detection Approach ◽

Input Reconstruction ◽

Quantitative Indicators

Condition monitoring is an indispensable element related to the operation of rotating machinery. In this article, the monitoring system for the parallel gearbox was proposed. The novelty detection approach is used to develop the condition assessment support system, which requires data collection for a healthy structure. The measured signals were processed to extract quantitative indicators sensitive to the type of damage occurring in this type of structure. The indicator’s values were used for the development of four different novelty detection algorithms. Presented novelty detection models operate on three principles: feature space distance, probability distribution, and input reconstruction. One of the distance-based models is adaptive, adjusting to new data flowing in the form of a stream. The authors test the developed algorithms on experimental and simulation data with a similar distribution, using the training set consisting mainly of samples generated by the simulator. Presented in the article results demonstrate the effectiveness of the trained models on both data sets.

Download Full-text

A Systematic Review of Landsat Data for Change Detection Applications: 50 Years of Monitoring the Earth

Remote Sensing ◽

10.3390/rs13152869 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2869

Author(s):

MohammadAli Hemati ◽

Mahdi Hasanlou ◽

Masoud Mahdianpari ◽

Fariba Mohammadimanesh

Keyword(s):

Time Series ◽

Change Detection ◽

Policy Development ◽

Atmospheric Correction ◽

Open Data ◽

Detection Methods ◽

Landsat 8 ◽

Landsat Images ◽

Detection Analysis ◽

The Impact

With uninterrupted space-based data collection since 1972, Landsat plays a key role in systematic monitoring of the Earth’s surface, enabled by an extensive and free, radiometrically consistent, global archive of imagery. Governments and international organizations rely on Landsat time series for monitoring and deriving a systematic understanding of the dynamics of the Earth’s surface at a spatial scale relevant to management, scientific inquiry, and policy development. In this study, we identify trends in Landsat-informed change detection studies by surveying 50 years of published applications, processing, and change detection methods. Specifically, a representative database was created resulting in 490 relevant journal articles derived from the Web of Science and Scopus. From these articles, we provide a review of recent developments, opportunities, and trends in Landsat change detection studies. The impact of the Landsat free and open data policy in 2008 is evident in the literature as a turning point in the number and nature of change detection studies. Based upon the search terms used and articles included, average number of Landsat images used in studies increased from 10 images before 2008 to 100,000 images in 2020. The 2008 opening of the Landsat archive resulted in a marked increase in the number of images used per study, typically providing the basis for the other trends in evidence. These key trends include an increase in automated processing, use of analysis-ready data (especially those with atmospheric correction), and use of cloud computing platforms, all over increasing large areas. The nature of change methods has evolved from representative bi-temporal pairs to time series of images capturing dynamics and trends, capable of revealing both gradual and abrupt changes. The result also revealed a greater use of nonparametric classifiers for Landsat change detection analysis. Landsat-9, to be launched in September 2021, in combination with the continued operation of Landsat-8 and integration with Sentinel-2, enhances opportunities for improved monitoring of change over increasingly larger areas with greater intra- and interannual frequency.

Download Full-text

SARIMA Approach to Generating Synthetic Monthly Rainfall in the Sinú River Watershed in Colombia

Atmosphere ◽

10.3390/atmos11060602 ◽

2020 ◽

Vol 11 (6) ◽

pp. 602

Author(s):

Luisa Martínez-Acosta ◽

Juan Pablo Medrano-Barboza ◽

Álvaro López-Ramos ◽

John Freddy Remolina López ◽

Álvaro Alberto López-Lambraño

Keyword(s):

Time Series ◽

Water Resources ◽

Autocorrelation Function ◽

Information Criterion ◽

Monthly Rainfall ◽

Statistical Characteristics ◽

Rainfall Time Series ◽

River Watershed ◽

Statistical Measures ◽

Sarima Models

Seasonal Auto Regressive Integrative Moving Average models (SARIMA) were developed for monthly rainfall time series. Normality of the rainfall time series was achieved by using the Box Cox transformation. The best SARIMA models were selected based on their autocorrelation function (ACF), partial autocorrelation function (PACF), and the minimum values of the Akaike Information Criterion (AIC). The result of the Ljung–Box statistical test shows the randomness and homogeneity of each model residuals. The performance and validation of the SARIMA models were evaluated based on various statistical measures, among these, the Student’s t-test. It is possible to obtain synthetic records that preserve the statistical characteristics of the historical record through the SARIMA models. Finally, the results obtained can be applied to various hydrological and water resources management studies. This will certainly assist policy and decision-makers to establish strategies, priorities, and the proper use of water resources in the Sinú river watershed.

Download Full-text

On the Connection between the GEP Performances and the Time Series Properties

Mathematics ◽

10.3390/math9161853 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1853

Author(s):

Alina Bărbulescu ◽

Cristian Ștefan Dumitriu

Keyword(s):

Time Series ◽

Goodness Of Fit ◽

Functional Form ◽

Financial Time Series ◽

Gene Expression Programming ◽

Parametric Models ◽

Statistical Characteristics ◽

Data Series ◽

Financial Time ◽

Data Generating Process

Artificial intelligence (AI) methods are interesting alternatives to classical approaches for modeling financial time series since they relax the assumptions imposed on the data generating process by the parametric models and do not impose any constraint on the model’s functional form. Even if many studies employed these techniques for modeling financial time series, the connection of the models’ performances with the statistical characteristics of the data series has not yet been investigated. Therefore, this research aims to study the performances of Gene Expression Programming (GEP) for modeling monthly and weekly financial series that present trend and/or seasonality and after the removal of each component. It is shown that series normality and homoskedasticity do not influence the models’ quality. The trend removal increases the models’ performance, whereas the seasonality elimination results in diminishing the goodness of fit. Comparisons with ARIMA models built are also provided.

Download Full-text

Jonckheere–Terpstra–Kendall-based non-parametric analysis of temporal differential gene expression

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab021 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Hitoshi Iuchi ◽

Michiaki Hamada

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Course ◽

Time Series Data ◽

Expression Patterns ◽

Detection Methods ◽

Series Data ◽

Expression Levels ◽

Over Time ◽

Non Parametric

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

Download Full-text

Multivariate Anomaly Detection for Earth Observations: A Comparison of Algorithms and Feature Extraction Techniques

10.5194/esd-2016-51 ◽

2016 ◽

Cited By ~ 1

Author(s):

Milan Flach ◽

Fabian Gans ◽

Alexander Brenning ◽

Joachim Denzler ◽

Markus Reichstein ◽

...

Keyword(s):

Feature Extraction ◽

Anomaly Detection ◽

Data Streams ◽

Multivariate Data ◽

Detection Methods ◽

Earth System ◽

Earth System Science ◽

System Science ◽

Detection Algorithms ◽

Earth Observations

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.

Download Full-text

ON THE EQUIVALENCE BETWEEN BAYESIAN AND FREQUENTIST NONPARAMETRIC HYPOTHESIS TESTING

10.37099/mtu.dc.etdr/507 ◽

2017 ◽

Author(s):

Qiuchen Hai

Keyword(s):

Hypothesis Testing ◽

Nonparametric Hypothesis ◽

Nonparametric Hypothesis Testing

Download Full-text

Explaining Bad Forecasts in Global Time Series Models

Applied Sciences ◽

10.3390/app11199243 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9243

Author(s):

Jože Rožanec ◽

Elena Trajkova ◽

Klemen Kenda ◽

Blaž Fortuna ◽

Dunja Mladenić

Keyword(s):

Time Series ◽

Model Performance ◽

Time Series Forecasting ◽

Modular Architecture ◽

Detection Algorithms ◽

Training Samples ◽

Forecasting Performance ◽

Global Time ◽

Real World Datasets ◽

Value Changes

While increasing empirical evidence suggests that global time series forecasting models can achieve better forecasting performance than local ones, there is a research void regarding when and why the global models fail to provide a good forecast. This paper uses anomaly detection algorithms and explainable artificial intelligence (XAI) to answer when and why a forecast should not be trusted. To address this issue, a dashboard was built to inform the user regarding (i) the relevance of the features for that particular forecast, (ii) which training samples most likely influenced the forecast outcome, (iii) why the forecast is considered an outlier, and (iv) provide a range of counterfactual examples to understand how value changes in the feature vector can lead to a different outcome. Moreover, a modular architecture and a methodology were developed to iteratively remove noisy data instances from the train set, to enhance the overall global time series forecasting model performance. Finally, to test the effectiveness of the proposed approach, it was validated on two publicly available real-world datasets.

Download Full-text