Ensemble-based machine learning approach for improved leak detection in water mains

Abstract This paper presents an acoustic leak detection system for distribution water mains using machine learning methods. The problem is formulated as a binary classifier to identify leak and no-leak cases using acoustic signals. A supervised learning methodology has been employed using several detection features extracted from acoustic signals, such as power spectral density and time-series data. The training and validation data sets have been collected over several months from multiple cities across North America. The proposed solution includes a multi-strategy ensemble learning (MEL) using a gradient boosting tree (GBT) classification model, which has performed better in maximizing detection rate and minimizing false positives as compared with other classification models such as KNN, ANN, and rule-based techniques. Further improvements have been achieved using a multitude of GBT classifiers combined in a parallel ensemble method called bagging algorithm. The proposed MEL approach demonstrates a significant improvement in performance, resulting in a reduction of false positives reports by an order of magnitude.

Download Full-text

Prediction and Analysis of Gold Prices using Ensemble Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36028 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4367-4374

Author(s):

Gudipally Chandrashakar

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Gold Price ◽

Machine Learning Algorithms ◽

Series Data ◽

Gradient Boosting ◽

Support Vector ◽

Average Value ◽

Ensemble Machine Learning

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.

Download Full-text

An In-Pipe Leak Detection Sensor: Sensing Capabilities and Evaluation

Volume 3: 2011 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications, Parts A and B ◽

10.1115/detc2011-48411 ◽

2011 ◽

Cited By ~ 5

Author(s):

Dimitris M. Chatzigeorgiou ◽

Atia E. Khalifa ◽

Kamal Youcef-Toumi ◽

Rached Ben-Mansour

Keyword(s):

Water Supply ◽

Environmental Health ◽

Fresh Water ◽

Detection System ◽

Acoustic Emissions ◽

Leak Detection ◽

Acoustic Signals ◽

Clear Understanding ◽

Water Pipes ◽

Water Pipeline

In most cases the deleterious effects associated with the occurrence of leak may present serious problems and therefore leaks must be quickly detected, located and repaired. The problem of leakage becomes even more serious when it is concerned with the vital supply of fresh water to the community. In addition to waste of resources, contaminants may infiltrate into the water supply. The possibility of environmental health disasters due to delay in detection of water pipeline leaks has spurred research into the development of methods for pipeline leak and contamination detection. Leaks in water pipes create acoustic emissions, which can be sensed to identify and localize leaks. Leak noise correlators and listening devices have been reported in the literature as successful approaches to leak detection but they have practical limitations in terms of cost, sensitivity, reliability and scalability. To overcome those limitations the development of an in-pipe traveling leak detection system is proposed. The development of such a system requires a clear understanding of acoustic signals generated from leaks and the study of the variation of those signals with different pipe loading conditions, leak sizes and surrounding media. This paper discusses those signals and evaluates the merits of an in-pipe-floating sensor.

Download Full-text

False Positive RFID Detection Using Classification Models

Applied Sciences ◽

10.3390/app9061154 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1154 ◽

Cited By ~ 11

Author(s):

Ganjar Alfian ◽

Muhammad Syafrudin ◽

Bohan Yoon ◽

Jongtae Rhee

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Real Time ◽

Outlier Detection ◽

Radio Frequency Identification ◽

False Positives ◽

Machine Learning Algorithms ◽

Classification Model ◽

Automated Identification ◽

Rfid Data

Radio frequency identification (RFID) is an automated identification technology that can be utilized to monitor product movements within a supply chain in real-time. However, one problem that occurs during RFID data capturing is false positives (i.e., tags that are accidentally detected by the reader but not of interest to the business process). This paper investigates using machine learning algorithms to filter false positives. Raw RFID data were collected based on various tagged product movements, and statistical features were extracted from the received signal strength derived from the raw RFID data. Abnormal RFID data or outliers may arise in real cases. Therefore, we utilized outlier detection models to remove outlier data. The experiment results showed that machine learning-based models successfully classified RFID readings with high accuracy, and integrating outlier detection with machine learning models improved classification accuracy. We demonstrated the proposed classification model could be applied to real-time monitoring, ensuring false positives were filtered and hence not stored in the database. The proposed model is expected to improve warehouse management systems by monitoring delivered products to other supply chain partners.

Download Full-text

An Interpretable Early Dynamic Sequential Predictor for Sepsis-Induced Coagulopathy Progression in the Real-World Using Machine Learning

Frontiers in Medicine ◽

10.3389/fmed.2021.775047 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ruixia Cui ◽

Wenbo Hua ◽

Kai Qu ◽

Heran Yang ◽

Yingmu Tong ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Time Series Data ◽

Time Window ◽

Medical Center ◽

Characteristic Curve ◽

Series Data ◽

Gradient Boosting ◽

Early Management ◽

Extreme Gradient Boosting

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.

Download Full-text

Predictive Credit Risk Analytics Using Borrowers' Digital Footprint and Methods of Statistical Machine Learning

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.358-372 ◽

2021 ◽

Vol 12 (7) ◽

pp. 358-372

Author(s):

E. V. Orlova ◽

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Risk Profile ◽

Classification Model ◽

Gradient Boosting ◽

Suggested Approach ◽

Digital Footprint ◽

Credit Risks ◽

Stochastic Gradient Boosting ◽

Boosting Method

The article considers the problem of reducing the banks credit risks associated with the insolvency of borrowers — individuals using financial, socio-economic factors and additional data about borrowers digital footprint. A critical analysis of existing approaches, methods and models in this area has been carried out and a number of significant shortcomings identified that limit their application. There is no comprehensive approach to identifying a borrowers creditworthiness based on information, including data from social networks and search engines. The new methodological approach for assessing the borrowers risk profile based on the phased processing of quantitative and qualitative data and modeling using methods of statistical analysis and machine learning is proposed. Machine learning methods are supposed to solve clustering and classification problems. They allow to automatically determine the data structure and make decisions through flexible and local training on the data. The method of hierarchical clustering and the k-means method are used to identify similar social, anthropometric and financial indicators, as well as indicators characterizing the digital footprint of borrowers, and to determine the borrowers risk profile over group. The obtained homogeneous groups of borrowers with a unique risk profile are further used for detailed data analysis in the predictive classification model. The classification model is based on the stochastic gradient boosting method to predict the risk profile of a potencial borrower. The suggested approach for individuals creditworthiness assessing will reduce the banks credit risks, increase its stability and profitability. The implementation results are of practical importance. Comparative analysis of the effectiveness of the existing and the proposed methodology for assessing credit risk showed that the new methodology provides predictive analytics of heterogeneous information about a potential borrower and the accuracy of analytics is higher. The proposed techniques are the core for the decision support system for justification of individuals credit conditions, minimizing the aggregate credit risks.

Download Full-text

Pipeline Rupture Detection Based on Machine Learning and Pattern Recognition

Volume 3: Operations, Monitoring and Maintenance; Materials and Joining ◽

10.1115/ipc2016-64471 ◽

2016 ◽

Author(s):

Martin Di Blasi ◽

Zhan Li

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Task Force ◽

Detection System ◽

Leak Detection ◽

Guidance Document ◽

Performance Expectations ◽

Detection Systems ◽

Short Period ◽

Recognition And Response

Pipeline ruptures have the potential to cause significant economic and environmental impact in a short period of time, therefore it is critical for pipeline operators to be able to promptly detect and respond to them. Public stakeholder expectations are high and an evolving expectation is that the response to such events be automated by initiating an automatic pipeline shutdown upon receipt of rupture alarm. These types of performance expectations are challenging to achieve with conventional, model-based, leak-detection systems (i.e. CPM–RTTMs) as the reliability measured in terms of the false alarm rate is typically too low. The company has actively participated on a pipeline-industry task force chaired by the API Cybernetics committee, focused on the development of best practices in the area of Rupture Recognition and Response. After API’s release of the first version of a Rupture Recognition and Response guidance document in 2014, the company has initiated development of its own internal Rupture Recognition Program (RRP). The RRP considers several rupture recognition approaches simultaneously, ranging from improvements to existing CPM leak detection to the development of new SCADA based rupture detection system (RDS). This paper will provide an overview of a specific approach to rupture detection based on the use of machine learning and pattern recognition techniques applied to SCADA data.

Download Full-text

Classification Accuracy Enhancement Based Machine Learning Models and Transform Analysis

Communications - Scientific letters of the University of Zilina ◽

10.26552/com.c.2021.2.c44-c53 ◽

2021 ◽

Vol 23 (2) ◽

pp. C44-C53

Author(s):

Hanan A. R. Akkar ◽

Wael A. H. Hadi ◽

Ibraheem H. Al-Dosari ◽

Saadi M. Saadi ◽

Aseel Ismael Ali

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Detection System ◽

Signal To Noise Ratio ◽

Leak Detection ◽

The Novel ◽

Denoising Method ◽

System A ◽

New Novel ◽

Water Pipeline

The problem of leak detection in water pipeline network can be solved by utilizing a wireless sensor network based an intelligent algorithm. A new novel denoising process is proposed in this work. A comparison study is established to evaluate the novel denoising method using many performance indices. Hardyrectified thresholding with universal threshold selection rule shows the best obtained results among the utilized thresholding methods in the work with Enhanced signal to noise ratio (SNR) = 10.38 and normalized mean squared error (NMSE) = 0.1344. Machine learning methods are used to create models that simulate a pipeline leak detection system. A combined feature vector is utilized using wavelet and statistical factors to improve the proposed system performance.

Download Full-text

Explanation Framework for Intrusion Detection

Machine Learning for Cyber Physical Systems - Technologien für die intelligente Automation ◽

10.1007/978-3-662-62746-4_9 ◽

2020 ◽

pp. 83-91

Author(s):

Nadia Burkart ◽

Maximilian Franz ◽

Marco F. Huber

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

False Positives ◽

Human Reasoning ◽

Malicious Activity ◽

Specific Policy ◽

Decision Boundaries

AbstractMachine learning and deep learning are widely used in various applications to assist or even replace human reasoning. For instance, a machine learning based intrusion detection system (IDS) monitors a network for malicious activity or specific policy violations. We propose that IDSs should attach a sufficiently understandable report to each alert to allow the operator to review them more efficiently. This work aims at complementing an IDS by means of a framework to create explanations. The explanations support the human operator in understanding alerts and reveal potential false positives. The focus lies on counterfactual instances and explanations based on locally faithful decision-boundaries.

Download Full-text

Reliable photometric membership (RPM) of galaxies in clusters – I. A machine learning method and its performance in the local universe

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa486 ◽

2020 ◽

Vol 493 (3) ◽

pp. 3429-3441

Author(s):

Paulo A A Lopes ◽

André L B Ribeiro

Keyword(s):

Machine Learning ◽

Galaxy Evolution ◽

Large Scale ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Validation Data ◽

Membership Probability ◽

Cluster Membership ◽

Stochastic Gradient Boosting

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.

Download Full-text

A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU

Electronics ◽

10.3390/electronics9020219 ◽

2020 ◽

Vol 9 (2) ◽

pp. 219 ◽

Cited By ~ 37

Author(s):

Sweta Bhattacharya ◽

Siva Rama Krishnan S ◽

Praveen Kumar Reddy Maddikunta ◽

Rajesh Kaluri ◽

Saurabh Singh ◽

...

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Comprehensive Evaluation ◽

Detection System ◽

Human Life ◽

Principal Component ◽

Cyber Attacks ◽

Classification Model ◽

Learning Approaches ◽

Machine Learning Model

The enormous popularity of the internet across all spheres of human life has introduced various risks of malicious attacks in the network. The activities performed over the network could be effortlessly proliferated, which has led to the emergence of intrusion detection systems. The patterns of the attacks are also dynamic, which necessitates efficient classification and prediction of cyber attacks. In this paper we propose a hybrid principal component analysis (PCA)-firefly based machine learning model to classify intrusion detection system (IDS) datasets. The dataset used in the study is collected from Kaggle. The model first performs One-Hot encoding for the transformation of the IDS datasets. The hybrid PCA-firefly algorithm is then used for dimensionality reduction. The XGBoost algorithm is implemented on the reduced dataset for classification. A comprehensive evaluation of the model is conducted with the state of the art machine learning approaches to justify the superiority of our proposed approach. The experimental results confirm the fact that the proposed model performs better than the existing machine learning models.

Download Full-text