scholarly journals Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection—rejoinder

2020 ◽  
Vol 49 (4) ◽  
pp. 1099-1105 ◽  
Author(s):  
Piotr Fryzlewicz

AbstractMany existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing $$0, \ldots , T-1$$ 0 , … , T - 1 change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter.

Author(s):  
Karolos K. Korkas

AbstractWe propose a new technique for consistent estimation of the number and locations of the change-points in the structure of an irregularly spaced time series. The core of the segmentation procedure is the ensemble binary segmentation method (EBS), a technique in which a large number of multiple change-point detection tasks using the binary segmentation method are applied on sub-samples of the data of differing lengths, and then the results are combined to create an overall answer. We do not restrict the total number of change-points a time series can have, therefore, our proposed method works well when the spacings between change-points are short. Our main change-point detection statistic is the time-varying autoregressive conditional duration model on which we apply a transformation process in order to decorrelate it. To examine the performance of EBS we provide a simulation study for various types of scenarios. A proof of consistency is also provided. Our methodology is implemented in the R package , available to download from CRAN.


Water ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 1633
Author(s):  
Elena-Simona Apostol ◽  
Ciprian-Octavian Truică ◽  
Florin Pop ◽  
Christian Esposito

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.


Author(s):  
Mehdi Moradi ◽  
Manuel Montesino-SanMartin ◽  
M. Dolores Ugarte ◽  
Ana F. Militino

AbstractWe propose an adaptive-sliding-window approach (LACPD) for the problem of change-point detection in a set of time-ordered observations. The proposed method is combined with sub-sampling techniques to compensate for the lack of enough data near the time series’ tails. Through a simulation study, we analyse its behaviour in the presence of an early/middle/late change-point in the mean, and compare its performance with some of the frequently used and recently developed change-point detection methods in terms of power, type I error probability, area under the ROC curves (AUC), absolute bias, variance, and root-mean-square error (RMSE). We conclude that LACPD outperforms other methods by maintaining a low type I error probability. Unlike some other methods, the performance of LACPD does not depend on the time index of change-points, and it generally has lower bias than other alternative methods. Moreover, in terms of variance and RMSE, it outperforms other methods when change-points are close to the time series’ tails, whereas it shows a similar (sometimes slightly poorer) performance as other methods when change-points are close to the middle of time series. Finally, we apply our proposal to two sets of real data: the well-known example of annual flow of the Nile river in Awsan, Egypt, from 1871 to 1970, and a novel remote sensing data application consisting of a 34-year time-series of satellite images of the Normalised Difference Vegetation Index in Wadi As-Sirham valley, Saudi Arabia, from 1986 to 2019. We conclude that LACPD shows a good performance in detecting the presence of a change as well as the time and magnitude of change in real conditions.


2020 ◽  
Author(s):  
Simon Letzgus

Abstract. Analysis of data from wind turbine supervisory control and data acquisition (SCADA) systems has attracted considerable research interest in recent years. The data is predominantly used to gain insights into turbine condition without the need for additional sensing equipment. Most successful approaches apply semi-supervised anomaly detection methods, also called normal behaivour models, that use clean training data sets to establish healthy component baseline models. However, one of the major challenges when working with wind turbine SCADA data in practice is the presence of systematic changes in signal behaviour induced by malfunctions or maintenance actions. Even though this problem is well described in literature it has not been systematically addressed so far. This contribution is the first to comprehensively analyse the presence of change-points in wind turbine SCADA signals and introduce an algorithm for their automated detection. 600 signals from 33 turbines are analysed over an operational period of more than two years. During this time one third of the signals are affected by change-points. Kernel change-point detection methods have shown promising results in similar settings but their performance strongly depends on the choice of several hyperparameters. This contribution presents a comprehensive comparison between different kernels as well as kernel-bandwidth and regularisation-penalty selection heuristics. Moreover, an appropriate data pre-processing procedure is introduced. The results show that the combination of Laplace kernels with a newly introduced bandwidth and penalty selection heuristic robustly outperforms existing methods. In a signal validation setting more than 90 % of the signals were classified correctly regarding the presence or absence of change-points, resulting in a F1-score of 0.86. For a change-point-free sequence selection the most severe 60 % of all CPs could be automatically removed with a precision of more than 0.96 and therefore without a significant loss of training data. These results indicate that the algorithm can be a meaningful step towards automated SCADA data pre-processing which is key for data driven methods to reach their full potential. The algorithm is open source and its implementation in Python publicly available.


2020 ◽  
Vol 49 (4) ◽  
pp. 1076-1080
Author(s):  
Haeran Cho ◽  
Claudia Kirch

AbstractWe congratulate the author for this interesting paper which introduces a novel method for the data segmentation problem that works well in a classical change point setting as well as in a frequent jump situation. Most notably, the paper introduces a new model selection step based on finding the ‘steepest drop to low levels’ (SDLL). Since the new model selection requires a complete (or at least relatively deep) solution path ordering the change point candidates according to some measure of importance, a new recursive variant of the Wild Binary Segmentation (Fryzlewicz in Ann Stat 42:2243–2281, 2014, WBS) named WBS2, has been proposed for candidate generation.


2021 ◽  
Vol 13 (2) ◽  
pp. 247
Author(s):  
Youssef Wehbe ◽  
Marouane Temimi

A better understanding of the spatiotemporal distribution of water resources is crucial for the sustainable development of hyper-arid regions. Here, we focus on the Arabian Peninsula (AP) and use remotely sensed data to (i) analyze the local climatology of total water storage (TWS), precipitation, and soil moisture; (ii) characterize their temporal variability and spatial distribution; and (iii) infer recent trends and change points within their time series. Remote sensing data for TWS, precipitation, and soil moisture are obtained from the Gravity Recovery and Climate Experiment (GRACE), the Tropical Rainfall Measuring Mission (TRMM), and the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E), respectively. The study relies on trend analysis, the modified Mann–Kendall test, and change point detection statistics. We first derive 10-year (2002–2011) seasonal averages from each of the datasets and intercompare their spatial organization. In the absence of large-scale in situ data, we then compare trends from GRACE TWS retrievals to in situ groundwater observations locally over the subdomain of the United Arab Emirates (UAE). TWS anomalies vary between −6.2 to 3.2 cm/month and −6.8 to −0.3 cm/month during the winter and summer periods, respectively. Trend analysis shows decreasing precipitation trends (−2.3 × 10−4 mm/day) spatially aligned with decreasing soil moisture trends (−1.5 × 10−4 g/cm3/month) over the southern part of the AP, whereas the highest decreasing TWS trends (−8.6 × 10−2 cm/month) are recorded over areas of excessive groundwater extraction in the northern AP. Interestingly, change point detection reveals increasing precipitation trends pre- and post-change point breaks over the entire AP region. Significant spatial dependencies are observed between TRMM and GRACE change points, particularly over Yemen during 2010, revealing the dominant impact of climatic changes on TWS depletion.


2021 ◽  
Author(s):  
Miriam Sieg ◽  
Lina Katrin Sciesielski ◽  
Karin Kirschner ◽  
Jochen Kruppa

Abstract Background: In longitudinal studies, observations are made over time. Hence, the single observations at each time point are dependent, making them a repeated measurement. In this work, we explore a different, counterintuitive setting: At each developmental time point, a lethal observation is performed on the pregnant or nursing mother. Therefore, the single time points are independent. Furthermore, the observation in the offspring at each time point is correlated with each other because each litter consists of several (genetically linked) littermates. In addition, the observed time series is short from a statistical perspective as animal ethics prevent killing more mother mice than absolutely necessary, and murine development is short anyway. We solve these challenges by using multiple contrast tests and visualizing the change point by the use of confidence intervals.Results: We used linear mixed models to model the variability of the mother. The estimates from the linear mixed model are then used in multiple contrast tests.There are a variety of contrasts and intuitively, we would use the Changepoint method. However, it does not deliver satisfying results. Interestingly, we found two other contrasts, both capable of answering different research questions in change point detection: i) Should a single point with change direction be found, or ii) Should the overall progression be determined? The Sequen contrast answers the first, the McDermott the second. Confidence intervals deliver effect estimates for the strength of the potential change point. Therefore, the scientist can define a biologically relevant limit of change depending on the research question.Conclusion: We present a solution with effect estimates for short independent time series with observations nested at a given time point. Multiple contrast tests produce confidence intervals, which allow determining the position of change points or to visualize the expression course over time. We suggest to use McDermott’s method to determine if there is an overall significant change within the time frame, while Sequen is better in determining specific change points. In addition, we offer a short formula for the estimation of the maximal length of the time series.


Sign in / Sign up

Export Citation Format

Share Document