scholarly journals K-Nearest Neighbours Method as a Tool for Failure Rate Prediction

Author(s):  
Małgorzata Kutyłowska

The paper shows the results of failure rate prediction using non-parametric regression algorithm K-nearest neighbours. The whole data set for years 1999-2013 was divided randomly into two groups (learning – 75% and testing – 25%). Besides, data from year 2014 were used for verifying the model. The dependent variable (failure rate) was forecasted on the basis of independent variables (number of installed house connections, total length and number of damages of water mains, distribution pipes and house connections). Four types of distance metric: Euclidean, quadratic Euclidean, Manhattan and Czebyszew were checked and four KNN models were created. Taking into consideration all constraints and assumptions, models using Euclidean and quadratic Euclidean distance metrics gave the most optimal prediction results. The optimal number of K nearest neighbours equalled to 2 and 3 concerning models KNN-E, KNN-E2, KNN-C and KNN-M, respectively. Validation error was the smallest for models KNN-E and KNN-E2 and amounted to 0.0130, for model KNN-M was equal to 0.0152 and for KNN-C to 0.0150.

Author(s):  
Małgorzata Kutyłowska

In this paper MARSplines method was presented to model failure rate of water pipes in years 2015-2016 in the selected Polish city. The output parameters were chosen as three dependent variables - three values of failure rate of water mains, distribution pipes and house connections. Diameter, season, material and kind of the conduit were selected as independent variables. At the beginning of modelling 21 basis (splines) function were assumed. On a final note two functions were selected (after reduction of negligible functions). The model consists of three factors: β0, β1 and β2. The penalty for adding basis function was assumed at the level of 2. The correlation was equalled to 0.44. Relatively huge discrepancies between real and predicted values of failure rate of water mains and house connections were observed. In the future investigations concerning this problem the three separated models for each kind of conduit should be created. The calculations using MARSplines method were carried out in the program Statistica 13.1.


2018 ◽  
Vol 19 (1) ◽  
pp. 264-273 ◽  
Author(s):  
M. Kutyłowska

Abstract This paper presents the results of failure rate prediction by means of support vector machines (SVM) – a non-parametric regression method. A hyperplane is used to divide the whole area in such a way that objects of different affiliation are separated from one another. The number of support vectors determines the complexity of the relations between dependent and independent variables. The calculations were performed using Statistical 12.0. Operational data for one selected zone of the water supply system for the period 2008–2014 were used for forecasting. The whole data set (in which data on distribution pipes were distinguished from those on house connections) for the years 2008–2014 was randomly divided into two subsets: a training subset – 75% (5 years) and a testing subset – 25% (2 years). Dependent variables (λr for the distribution pipes and λp for the house connections) were forecast using independent variables (the total length – Lr and Lp and number of failures – Nr and Np of the distribution pipes and the house connections, respectively). Four kinds of kernel functions (linear, polynomial, sigmoidal and radial basis functions) were applied. The SVM model based on the linear kernel function was found to be optimal for predicting the failure rate of each kind of water conduit. This model's maximum relative error of predicting failure rates λr and λp during the testing stage amounted to about 4% and 14%, respectively. The average experimental failure rates in the whole analysed period amounted to 0.18, 0.44, 0.17 and 0.24 fail./(km·year) for the distribution pipes, the house connections and the distribution pipes made of respectively PVC and cast iron.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8398
Author(s):  
Bijan G. Mobasseri ◽  
Amro Lulu

Radiometric identification is the problem of attributing a signal to a specific source. In this work, a radiometric identification algorithm is developed using the whitening transformation. The approach stands out from the more established methods in that it works directly on the raw IQ data and hence is featureless. As such, the commonly used dimensionality reduction algorithms do not apply. The premise of the idea is that a data set is “most white” when projected on its own whitening matrix than on any other. In practice, transformed data are never strictly white since the training and the test data differ. The Förstner-Moonen measure that quantifies the similarity of covariance matrices is used to establish the degree of whiteness. The whitening transform that produces a data set with the minimum Förstner-Moonen distance to a white noise process is the source signal. The source is determined by the output of the mode function operated on the Majority Vote Classifier decisions. Using the Förstner-Moonen measure presents a different perspective compared to maximum likelihood and Euclidean distance metrics. The whitening transform is also contrasted with the more recent deep learning approaches that are still dependent on feature vectors with large dimensions and lengthy training phases. It is shown that the proposed method is simpler to implement, requires no features vectors, needs minimal training and because of its non-iterative structure is faster than existing approaches.


2018 ◽  
Vol 59 ◽  
pp. 00021
Author(s):  
Małgorzata Kutyłowska

The paper describes the results of failure rate modeling using K-nearest neighbours method (KNN). This algorithm is one among other regression methods, called machine learning methods. The aim of the presented paper was to check the possibilities of application of such kind of modelling and the comparison between current results and investigations of failure rate prediction in another Polish city. Operational data from 12 years of exploitation, received from water utility, were used to predict dependent variable (failure rate). Data (249 and 294 for distribution pipes and house connections, respectively) from the time span 2001–2012 were used for creating the KNN models. On the basis of other data (one case for each year) the validation of optimal model, based on Euclidean distance metric with the number of nearest neighbours K = 2, was carried out. The realization of the modelling was performed in the software program Statistica 12.0.


2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.


1977 ◽  
Vol R-26 (3) ◽  
pp. 214-219 ◽  
Author(s):  
W.W. Gaertner ◽  
D.S. Elders ◽  
D.B. Ellingham ◽  
J.A. Kastning ◽  
W.M. Schreyer

Author(s):  
Rupam Mukherjee

For prognostics in industrial applications, the degree of anomaly of a test point from a baseline cluster is estimated using a statistical distance metric. Among different statistical distance metrics, energy distance is an interesting concept based on Newton’s Law of Gravitation, promising simpler computation than classical distance metrics. In this paper, we review the state of the art formulations of energy distance and point out several reasons why they are not directly applicable to the anomaly-detection problem. Thereby, we propose a new energy-based metric called the P-statistic which addresses these issues, is applicable to anomaly detection and retains the computational simplicity of the energy distance. We also demonstrate its effectiveness on a real-life data-set.


2021 ◽  
Vol 16 (1) ◽  
pp. 19
Author(s):  
Suhendro Yusuf Irianto ◽  
Ribut Yulianto ◽  
Sri Karnila ◽  
Dona Yuliawati

Penelitian ini menghasilkan sistem keamanan menggunakan biometrik, dengan menggunakan retina sebagai identitas pengenalan yang akurat, serta efektif untuk meningkatkan proses identifikasi pada retina dimasa depan (future identification). Hal ini sangat penting untuk menentukan keakuratan sifat biometrik apa yang paling baik di dalam proses mengidentifikasi di masa depan, sekaligus membangun suatu sistem aplikasi atau tools yang dapat digunakan untuk mengetahui karakteristik distance meterics untuk mengukur akurasi retina sebagai identitas dimasa depan (future identification). Penggunaan retina dapat menjadi salah satu alternatif identifikasi manusia  seperti  untuk  pengganti  PIN  ATM  Bank,  Paspor  dan bidang-bidang lain yang memerlukan tingkat keamanan tinggi atau mustahil untuk dapat dipalsukan. Hasil dari penelitian ini ialah berbentuk pengujian untuk membuktikan tingkat akurasi CBIR dengan menggunakan citra query dengan dibangun database sebanyak 5.000 citra retina. Metode yang akan digunakan dalam menentukan similarity dan identification dengan menggunakan fitur warna. Histogram warna untuk pencarian citra dikerjakan dengan mengitung jumlah koefisien DCT dari setiap warna. Hasil penelitian menunjukan bahwa akurasi algoritma mendekati nilai 90%, akurasi ini cukup bagus di bidang image retrieval.  Di lihat dari kecepatan proses retrieval juga cukup cepat dimana rata –rata kecepatan proses dengan menggunakan 2.000 citra digital adalah kurang dari 10 detik.


Sign in / Sign up

Export Citation Format

Share Document