scholarly journals Ensemble-based machine learning approach for improved leak detection in water mains

Author(s):  
Thambirajah Ravichandran ◽  
Keyhan Gavahi ◽  
Kumaraswamy Ponnambalam ◽  
Valentin Burtea ◽  
S. Jamshid Mousavi

Abstract This paper presents an acoustic leak detection system for distribution water mains using machine learning methods. The problem is formulated as a binary classifier to identify leak and no-leak cases using acoustic signals. A supervised learning methodology has been employed using several detection features extracted from acoustic signals, such as power spectral density and time-series data. The training and validation data sets have been collected over several months from multiple cities across North America. The proposed solution includes a multi-strategy ensemble learning (MEL) using a gradient boosting tree (GBT) classification model, which has performed better in maximizing detection rate and minimizing false positives as compared with other classification models such as KNN, ANN, and rule-based techniques. Further improvements have been achieved using a multitude of GBT classifiers combined in a parallel ensemble method called bagging algorithm. The proposed MEL approach demonstrates a significant improvement in performance, resulting in a reduction of false positives reports by an order of magnitude.

Author(s):  
Gudipally Chandrashakar

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.


Author(s):  
Dimitris M. Chatzigeorgiou ◽  
Atia E. Khalifa ◽  
Kamal Youcef-Toumi ◽  
Rached Ben-Mansour

In most cases the deleterious effects associated with the occurrence of leak may present serious problems and therefore leaks must be quickly detected, located and repaired. The problem of leakage becomes even more serious when it is concerned with the vital supply of fresh water to the community. In addition to waste of resources, contaminants may infiltrate into the water supply. The possibility of environmental health disasters due to delay in detection of water pipeline leaks has spurred research into the development of methods for pipeline leak and contamination detection. Leaks in water pipes create acoustic emissions, which can be sensed to identify and localize leaks. Leak noise correlators and listening devices have been reported in the literature as successful approaches to leak detection but they have practical limitations in terms of cost, sensitivity, reliability and scalability. To overcome those limitations the development of an in-pipe traveling leak detection system is proposed. The development of such a system requires a clear understanding of acoustic signals generated from leaks and the study of the variation of those signals with different pipe loading conditions, leak sizes and surrounding media. This paper discusses those signals and evaluates the merits of an in-pipe-floating sensor.


2019 ◽  
Vol 9 (6) ◽  
pp. 1154 ◽  
Author(s):  
Ganjar Alfian ◽  
Muhammad Syafrudin ◽  
Bohan Yoon ◽  
Jongtae Rhee

Radio frequency identification (RFID) is an automated identification technology that can be utilized to monitor product movements within a supply chain in real-time. However, one problem that occurs during RFID data capturing is false positives (i.e., tags that are accidentally detected by the reader but not of interest to the business process). This paper investigates using machine learning algorithms to filter false positives. Raw RFID data were collected based on various tagged product movements, and statistical features were extracted from the received signal strength derived from the raw RFID data. Abnormal RFID data or outliers may arise in real cases. Therefore, we utilized outlier detection models to remove outlier data. The experiment results showed that machine learning-based models successfully classified RFID readings with high accuracy, and integrating outlier detection with machine learning models improved classification accuracy. We demonstrated the proposed classification model could be applied to real-time monitoring, ensuring false positives were filtered and hence not stored in the database. The proposed model is expected to improve warehouse management systems by monitoring delivered products to other supply chain partners.


2021 ◽  
Vol 8 ◽  
Author(s):  
Ruixia Cui ◽  
Wenbo Hua ◽  
Kai Qu ◽  
Heran Yang ◽  
Yingmu Tong ◽  
...  

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.


2021 ◽  
Vol 12 (7) ◽  
pp. 358-372
Author(s):  
E. V. Orlova ◽  

The article considers the problem of reducing the banks credit risks associated with the insolvency of borrowers — individuals using financial, socio-economic factors and additional data about borrowers digital footprint. A critical analysis of existing approaches, methods and models in this area has been carried out and a number of significant shortcomings identified that limit their application. There is no comprehensive approach to identifying a borrowers creditworthiness based on information, including data from social networks and search engines. The new methodological approach for assessing the borrowers risk profile based on the phased processing of quantitative and qualitative data and modeling using methods of statistical analysis and machine learning is proposed. Machine learning methods are supposed to solve clustering and classification problems. They allow to automatically determine the data structure and make decisions through flexible and local training on the data. The method of hierarchical clustering and the k-means method are used to identify similar social, anthropometric and financial indicators, as well as indicators characterizing the digital footprint of borrowers, and to determine the borrowers risk profile over group. The obtained homogeneous groups of borrowers with a unique risk profile are further used for detailed data analysis in the predictive classification model. The classification model is based on the stochastic gradient boosting method to predict the risk profile of a potencial borrower. The suggested approach for individuals creditworthiness assessing will reduce the banks credit risks, increase its stability and profitability. The implementation results are of practical importance. Comparative analysis of the effectiveness of the existing and the proposed methodology for assessing credit risk showed that the new methodology provides predictive ana­lytics of heterogeneous information about a potential borrower and the accuracy of analytics is higher. The proposed techniques are the core for the decision support system for justification of individuals credit conditions, minimizing the aggregate credit risks.


Author(s):  
Martin Di Blasi ◽  
Zhan Li

Pipeline ruptures have the potential to cause significant economic and environmental impact in a short period of time, therefore it is critical for pipeline operators to be able to promptly detect and respond to them. Public stakeholder expectations are high and an evolving expectation is that the response to such events be automated by initiating an automatic pipeline shutdown upon receipt of rupture alarm. These types of performance expectations are challenging to achieve with conventional, model-based, leak-detection systems (i.e. CPM–RTTMs) as the reliability measured in terms of the false alarm rate is typically too low. The company has actively participated on a pipeline-industry task force chaired by the API Cybernetics committee, focused on the development of best practices in the area of Rupture Recognition and Response. After API’s release of the first version of a Rupture Recognition and Response guidance document in 2014, the company has initiated development of its own internal Rupture Recognition Program (RRP). The RRP considers several rupture recognition approaches simultaneously, ranging from improvements to existing CPM leak detection to the development of new SCADA based rupture detection system (RDS). This paper will provide an overview of a specific approach to rupture detection based on the use of machine learning and pattern recognition techniques applied to SCADA data.


Author(s):  
Hanan A. R. Akkar ◽  
Wael A. H. Hadi ◽  
Ibraheem H. Al-Dosari ◽  
Saadi M. Saadi ◽  
Aseel Ismael Ali

The problem of leak detection in water pipeline network can be solved by utilizing a wireless sensor network based an intelligent algorithm. A new novel denoising process is proposed in this work. A comparison study is established to evaluate the novel denoising method using many performance indices. Hardyrectified thresholding with universal threshold selection rule shows the best obtained results among the utilized thresholding methods in the work with Enhanced signal to noise ratio (SNR) = 10.38 and normalized mean squared error (NMSE) = 0.1344. Machine learning methods are used to create models that simulate a pipeline leak detection system. A combined feature vector is utilized using wavelet and statistical factors to improve the proposed system performance.


Author(s):  
Nadia Burkart ◽  
Maximilian Franz ◽  
Marco F. Huber

AbstractMachine learning and deep learning are widely used in various applications to assist or even replace human reasoning. For instance, a machine learning based intrusion detection system (IDS) monitors a network for malicious activity or specific policy violations. We propose that IDSs should attach a sufficiently understandable report to each alert to allow the operator to review them more efficiently. This work aims at complementing an IDS by means of a framework to create explanations. The explanations support the human operator in understanding alerts and reveal potential false positives. The focus lies on counterfactual instances and explanations based on locally faithful decision-boundaries.


2020 ◽  
Vol 493 (3) ◽  
pp. 3429-3441
Author(s):  
Paulo A A Lopes ◽  
André L B Ribeiro

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 219 ◽  
Author(s):  
Sweta Bhattacharya ◽  
Siva Rama Krishnan S ◽  
Praveen Kumar Reddy Maddikunta ◽  
Rajesh Kaluri ◽  
Saurabh Singh ◽  
...  

The enormous popularity of the internet across all spheres of human life has introduced various risks of malicious attacks in the network. The activities performed over the network could be effortlessly proliferated, which has led to the emergence of intrusion detection systems. The patterns of the attacks are also dynamic, which necessitates efficient classification and prediction of cyber attacks. In this paper we propose a hybrid principal component analysis (PCA)-firefly based machine learning model to classify intrusion detection system (IDS) datasets. The dataset used in the study is collected from Kaggle. The model first performs One-Hot encoding for the transformation of the IDS datasets. The hybrid PCA-firefly algorithm is then used for dimensionality reduction. The XGBoost algorithm is implemented on the reduced dataset for classification. A comprehensive evaluation of the model is conducted with the state of the art machine learning approaches to justify the superiority of our proposed approach. The experimental results confirm the fact that the proposed model performs better than the existing machine learning models.


Sign in / Sign up

Export Citation Format

Share Document