Anomaly Detection in Animal-Related Failures in Overhead Distribution Systems

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learning for Dynamic Ensemble Selection (META-DES) in combination with homogeneous and heterogeneous ensemble models and three SMOTE-based resampling algorithms (SMOTE, SMOTE+ENN and SMOTE+Tomek Links), and one missing data method (missForest) are proposed and evaluated. A binary real-world drinking-water quality anomaly detection dataset is utilised to evaluate the models. The experimental results obtained reveal all the models benefitting from the combined optimisation of both the classifiers and resampling methods. Considering the three performance measures (balanced accuracy, F-score and G-mean), the result also shows that the dynamic classifier selection (DCS) techniques, in particular, the missForest+SMOTE+RANK and missForest+SMOTE+OLA models based on homogeneous ensemble-bagging with decision tree as the base classifier, exhibited better performances in terms of balanced accuracy and G-mean, while the Bg+mF+SMENN+LCA model based on homogeneous ensemble-bagging with random forest has a better overall F1-measure in comparison to the other models.

Download Full-text

Multi-objective Logistic Regression for Anomaly Detection in Water Distribution Systems

10.1007/978-981-16-4126-8_13 ◽

2021 ◽

pp. 129-138

Author(s):

Gilberto Reynoso-Meza ◽

Elizabeth Pauline Carreño-Alvarado

Keyword(s):

Logistic Regression ◽

Anomaly Detection ◽

Distribution Systems ◽

Water Distribution ◽

Water Distribution Systems ◽

Multi Objective

Download Full-text

Novelty detection for time series data analysis in water distribution systems using support vector machines

Journal of Hydroinformatics ◽

10.2166/hydro.2010.144 ◽

2010 ◽

Vol 13 (4) ◽

pp. 672-686 ◽

Cited By ~ 66

Author(s):

Stephen R. Mounce ◽

Richard B. Mounce ◽

Joby B. Boxall

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Support Vector Regression ◽

Distribution Systems ◽

Water Distribution ◽

Time Series Data ◽

Water Distribution Systems ◽

Kernel Functions ◽

Series Data ◽

Support Vector

The sampling frequency and quantity of time series data collected from water distribution systems has been increasing in recent years, giving rise to the potential for improving system knowledge if suitable automated techniques can be applied, in particular, machine learning. Novelty (or anomaly) detection refers to the automatic identification of novel or abnormal patterns embedded in large amounts of “normal” data. When dealing with time series data (transformed into vectors), this means abnormal events embedded amongst many normal time series points. The support vector machine is a data-driven statistical technique that has been developed as a tool for classification and regression. The key features include statistical robustness with respect to non-Gaussian errors and outliers, the selection of the decision boundary in a principled way, and the introduction of nonlinearity in the feature space without explicitly requiring a nonlinear algorithm by means of kernel functions. In this research, support vector regression is used as a learning method for anomaly detection from water flow and pressure time series data. No use is made of past event histories collected through other information sources. The support vector regression methodology, whose robustness derives from the training error function, is applied to a case study.

Download Full-text