Multivariate Outlier Detection Approach Based on k-Nearest Neighbors and Its Application for Chemical Process Data

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data

Download Full-text

An Improved Outlier Detection Algorithm Based on Reverse K-Nearest Neighbors of Adaptive Parameters

Lecture Notes in Electrical Engineering - Frontier and Future Development of Information Technology in Medicine and Education ◽

10.1007/978-94-007-7618-0_47 ◽

2013 ◽

pp. 477-487

Author(s):

Xie Fangfang ◽

Xu Liancheng ◽

Chi Xuezhi ◽

Zhu Zhenfang

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

Detection Algorithm ◽

K Nearest Neighbors ◽

Adaptive Parameters

Download Full-text

K-NN Based Outlier Detection Technique on Intrusion Dataset

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010105 ◽

2017 ◽

Vol 7 (1) ◽

pp. 58-70 ◽

Cited By ~ 1

Author(s):

Santosh Kumar Sahu ◽

Sanjay Kumar Jena ◽

Manish Verma

Keyword(s):

Outlier Detection ◽

Time Complexity ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

The Other ◽

Detection Technique ◽

K Nearest Neighbors ◽

Time Required ◽

Local Outlier ◽

Quick Sort

Outliers in the database are the objects that deviate from the rest of the dataset by some measure. The Nearest Neighbor Outlier Factor is considering to measure the degree of outlier-ness of the object in the dataset. Unlike the other methods like Local Outlier Factor, this approach shows the interest of a point from both neighbors and reverse neighbors, and after that, an object comes into consideration. We have observed that in GBBK algorithm that based on K-NN, used quick sort to find k nearest neighbors that take O (N log N) time. However, in proposed method, the time required for searching on K times which complete in O (KN) time to find k nearest neighbors (k < < log N). As a result, the proposed method improves the time complexity. The NSL-KDD and Fisher iris dataset is used, and experimental results compared with the GBBK method. The result is same in both the methods, but the proposed method takes less time for computation.

Download Full-text

Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

Multivariate Behavioral Research ◽

10.1080/00273171.2011.606757 ◽

2011 ◽

Vol 46 (5) ◽

pp. 733-755 ◽

Cited By ~ 11

Author(s):

David Magis ◽

Paul De Boeck

Keyword(s):

Differential Item Functioning ◽

Outlier Detection ◽

Multivariate Outlier Detection ◽

Group Settings ◽

Multiple Group ◽

Item Functioning ◽

Detection Approach

Download Full-text

Multiscale fault classification framework using kernel principal component analysis and k-nearest neighbors for chemical process system

E3S Web of Conferences ◽

10.1051/e3sconf/202128703011 ◽

2021 ◽

Vol 287 ◽

pp. 03011

Author(s):

Muhammad Nawaz ◽

Abdulhalim Shah Maulud ◽

Haslinda Zabiri

Keyword(s):

Chemical Process ◽

Principal Component ◽

Nearest Neighbors ◽

Fault Classification ◽

Kernel Principal Component Analysis ◽

K Nearest Neighbors ◽

Process Systems ◽

Classification Framework ◽

Knn Classifier ◽

Chemical Process Systems

Process monitoring techniques in chemical process systems help to improve product quality and plant safety. Multiscale classification plays a crucial role in the monitoring of chemical processes. However, there is a problem in coping with high-dimensional correlated data produced by complex, nonlinear processes. Therefore, an improved multiscale fault classification framework has been proposed to enhance the fault classification ability in nonlinear chemical process systems. This framework combines wavelet transform (WT), kernel principal component analysis (KPCA), and K nearest neighbors (KNN) classifier. Initially, a moving window-based WT is used to extract multiscale information from process data in time and frequency simultaneously at different scales. The resulting wavelet coefficients are reconstructed and fed into the KPCA to produce feature vectors. In the final step, these vectors have been used as inputs for the KNN classifier. The performance of the proposed multi-scale KPCA-KNN framework is analyzed and compared using a continuous stirred tank reactor (CSTR) system as a case study. The results show that the proposed multiscale KPCA-KNN framework has a high success rate over PCA-KNN and KPCA-KNN methods.

Download Full-text

A Data Stream Outlier Detection Algorithm Based on Reverse K Nearest Neighbors

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.225-226.1032 ◽

2011 ◽

Vol 225-226 ◽

pp. 1032-1035 ◽

Cited By ~ 1

Author(s):

Zhong Ping Zhang ◽

Yong Xin Liang

Keyword(s):

Outlier Detection ◽

Data Stream ◽

Concept Drift ◽

Real Data ◽

Nearest Neighbors ◽

Detection Algorithm ◽

Data Sets ◽

K Nearest Neighbors ◽

Query Manager ◽

Current Window

This paper proposes a new data stream outlier detection algorithm SODRNN based on reverse nearest neighbors. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. The update of insertion or deletion only needs one scan of the current window, which improves efficiency. The capability of queries at arbitrary time on the whole current window is achieved by Query Manager Procedure, which can capture the phenomenon of concept drift of data stream in time. Results of experiments conducted on both synthetic and real data sets show that SODRNN algorithm is both effective and efficient.

Download Full-text

Distance-based k-nearest neighbors outlier detection method in large-scale traffic data

2015 IEEE International Conference on Digital Signal Processing (DSP) ◽

10.1109/icdsp.2015.7251924 ◽

2015 ◽

Cited By ~ 8

Author(s):

Taurus T. Dang ◽

Henry Y.T. Ngan ◽

Wei Liu

Keyword(s):

Outlier Detection ◽

Large Scale ◽

Detection Method ◽

Nearest Neighbors ◽

Traffic Data ◽

K Nearest Neighbors

Download Full-text