Multivariate Outlier Detection Approach Based on k-Nearest Neighbors and Its Application for Chemical Process Data

2014 ◽  
Vol 47 (12) ◽  
pp. 876-886 ◽  
Author(s):  
Yaming Dong ◽  
Xuefeng Yan
Respati ◽  
2018 ◽  
Vol 13 (2) ◽  
Author(s):  
Eri Sasmita Susanto ◽  
Kusrini Kusrini ◽  
Hanif Al Fatta

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma  yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data


Author(s):  
Santosh Kumar Sahu ◽  
Sanjay Kumar Jena ◽  
Manish Verma

Outliers in the database are the objects that deviate from the rest of the dataset by some measure. The Nearest Neighbor Outlier Factor is considering to measure the degree of outlier-ness of the object in the dataset. Unlike the other methods like Local Outlier Factor, this approach shows the interest of a point from both neighbors and reverse neighbors, and after that, an object comes into consideration. We have observed that in GBBK algorithm that based on K-NN, used quick sort to find k nearest neighbors that take O (N log N) time. However, in proposed method, the time required for searching on K times which complete in O (KN) time to find k nearest neighbors (k < < log N). As a result, the proposed method improves the time complexity. The NSL-KDD and Fisher iris dataset is used, and experimental results compared with the GBBK method. The result is same in both the methods, but the proposed method takes less time for computation.


2021 ◽  
Vol 287 ◽  
pp. 03011
Author(s):  
Muhammad Nawaz ◽  
Abdulhalim Shah Maulud ◽  
Haslinda Zabiri

Process monitoring techniques in chemical process systems help to improve product quality and plant safety. Multiscale classification plays a crucial role in the monitoring of chemical processes. However, there is a problem in coping with high-dimensional correlated data produced by complex, nonlinear processes. Therefore, an improved multiscale fault classification framework has been proposed to enhance the fault classification ability in nonlinear chemical process systems. This framework combines wavelet transform (WT), kernel principal component analysis (KPCA), and K nearest neighbors (KNN) classifier. Initially, a moving window-based WT is used to extract multiscale information from process data in time and frequency simultaneously at different scales. The resulting wavelet coefficients are reconstructed and fed into the KPCA to produce feature vectors. In the final step, these vectors have been used as inputs for the KNN classifier. The performance of the proposed multi-scale KPCA-KNN framework is analyzed and compared using a continuous stirred tank reactor (CSTR) system as a case study. The results show that the proposed multiscale KPCA-KNN framework has a high success rate over PCA-KNN and KPCA-KNN methods.


2011 ◽  
Vol 225-226 ◽  
pp. 1032-1035 ◽  
Author(s):  
Zhong Ping Zhang ◽  
Yong Xin Liang

This paper proposes a new data stream outlier detection algorithm SODRNN based on reverse nearest neighbors. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. The update of insertion or deletion only needs one scan of the current window, which improves efficiency. The capability of queries at arbitrary time on the whole current window is achieved by Query Manager Procedure, which can capture the phenomenon of concept drift of data stream in time. Results of experiments conducted on both synthetic and real data sets show that SODRNN algorithm is both effective and efficient.


Sign in / Sign up

Export Citation Format

Share Document