Data-Driven Leak Localization in Urban Water Distribution Networks Using Big Data for Random Forest Classifier

In the present paper, a Random Forest classifier is used to detect leak locations on two different sized water distribution networks with sparse sensor placement. A great number of leak scenarios were simulated with Monte Carlo determined leak parameters (leak location and emitter coefficient). In order to account for demand variations that occur on a daily basis and to obtain a larger dataset, scenarios were simulated with random base demand increments or reductions for each network node. Classifier accuracy was assessed for different sensor layouts and numbers of sensors. Multiple prediction models were constructed for differently sized leakage and demand range variations in order to investigate model accuracy under various conditions. Results indicate that the prediction model provides the greatest accuracy for the largest leaks, with the smallest variation in base demand (62% accuracy for greater- and 82% for smaller-sized networks, for the largest considered leak size and a base demand variation of ±2.5%). However, even for small leaks and the greatest base demand variations, the prediction model provided considerable accuracy, especially when localizing the sources of leaks when the true leak node and neighbor nodes were considered (for a smaller-sized network and a base demand of variation ±20% the model accuracy increased from 44% to 89% when top five nodes with greatest probability were considered, and for a greater-sized network with a base demand variation of ±10% the accuracy increased from 36% to 77%).

Download Full-text

An iterative method for leakage zone identification in water distribution networks based on machine learning

Structural Health Monitoring ◽

10.1177/1475921720950470 ◽

2020 ◽

pp. 147592172095047

Author(s):

Jingyu Chen ◽

Xin Feng ◽

Shiyun Xiao

Keyword(s):

Random Forest ◽

Iterative Method ◽

Water Distribution ◽

Pressure Sensors ◽

Distribution Networks ◽

Random Forest Classifier ◽

Identification Accuracy ◽

Water Distribution Networks ◽

Minimum Number ◽

Leakage Characteristics

For leakage identification in water distribution networks, if each node is used as a category label of the classifier model, the accuracy of the classifier model will be low because of similar leakage characteristics. By clustering the nodes with similar leakage characteristics and using all the possible combinations of leakages as the category labels of the classifier model, the accuracy of the classifier model for leakage location can be improved. An iterative method combining k-means clustering with the random forest classifier is proposed to identify the leakage zones. In each iteration, k-means clustering is used to divide the leakage zone identified in the previous iterations into two zones, and then, the random forest classifier is used to identify the leakage zones and the number of leakages in each leakage zone. As the number of iterations increases, the number of candidate leakage zones and sensors that conduct leakage zone identification decreases. Thus, feature selection can be used in each iteration to select the minimum number of sensors for model training without affecting identification accuracy. Three leakage scenarios are considered: a single leakage, two simultaneous leakages, and four simultaneous leakages. A benchmark case is presented in this study to demonstrate the effectiveness of the proposed method. The influences of the number of pressure sensors and Gaussian noise level on the identification results are also discussed. Results indicate that the proposed method is effective for identifying simultaneous leakages.

Download Full-text