K-NN Based Outlier Detection Technique on Intrusion Dataset

Outliers in the database are the objects that deviate from the rest of the dataset by some measure. The Nearest Neighbor Outlier Factor is considering to measure the degree of outlier-ness of the object in the dataset. Unlike the other methods like Local Outlier Factor, this approach shows the interest of a point from both neighbors and reverse neighbors, and after that, an object comes into consideration. We have observed that in GBBK algorithm that based on K-NN, used quick sort to find k nearest neighbors that take O (N log N) time. However, in proposed method, the time required for searching on K times which complete in O (KN) time to find k nearest neighbors (k < < log N). As a result, the proposed method improves the time complexity. The NSL-KDD and Fisher iris dataset is used, and experimental results compared with the GBBK method. The result is same in both the methods, but the proposed method takes less time for computation.

Download Full-text

An Incremental Local Outlier Detection Method in the Data Stream

Applied Sciences ◽

10.3390/app8081248 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1248 ◽

Cited By ~ 4

Author(s):

Haiqing Yao ◽

Xiuwen Fu ◽

Yongsheng Yang ◽

Octavian Postolache

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Detection Accuracy ◽

K Nearest Neighbor ◽

Major Work ◽

Wide Range ◽

Local Outlier

Outlier detection has attracted a wide range of attention for its broad applications, such as fault diagnosis and intrusion detection, among which the outlier analysis in data streams with high uncertainty and infinity is more challenging. Recent major work of outlier detection has focused on principle research of the local outlier factor, and there are few studies on incremental updating strategies, which are vital to outlier detection in data streams. In this paper, a novel incremental local outlier detection approach is introduced to dynamically evaluate the local outlier in the data stream. An extended local neighborhood consisting of k nearest neighbors, reverse nearest neighbors and shared nearest neighbors is estimated for each data. The theoretical evidence of algorithm complexity for the insertion of new data and deletion of old data in the composite neighborhood shows that the amount of affected data in the incremental calculation is finite. Finally, experiments performed on both synthetic and real datasets verify its scalability and outlier detection accuracy. All results show that the proposed approach has comparable performance with state-of-the-art k nearest neighbor-based methods.

Download Full-text

The Implementation of Subspace Outlier Detection in K-Nearest Neighbors to Improve Accuracy in Bank Marketing Data

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/44822020 ◽

2020 ◽

Vol 8 (2) ◽

pp. 545-550

Author(s):

Dimas Aryo Anggoro

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Improve Accuracy ◽

Marketing Data ◽

Bank Marketing

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

DS-kNN

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2021040107 ◽

2021 ◽

Vol 15 (2) ◽

pp. 131-144

Author(s):

Redha Taguelmimt ◽

Rachid Beghdad

Keyword(s):

Intrusion Detection ◽

False Positive ◽

Detection Rate ◽

Nearest Neighbors ◽

The Other ◽

Intrusion Detection Systems ◽

K Nearest Neighbors ◽

Detection Systems ◽

Knn Classifier ◽

Better Than

On one hand, there are many proposed intrusion detection systems (IDSs) in the literature. On the other hand, many studies try to deduce the important features that can best detect attacks. This paper presents a new and an easy-to-implement approach to intrusion detection, named distance sum-based k-nearest neighbors (DS-kNN), which is an improved version of k-NN classifier. Given a data sample to classify, DS-kNN computes the distance sum of the k-nearest neighbors of the data sample in each of the possible classes of the dataset. Then, the data sample is assigned to the class having the smallest sum. The experimental results show that the DS-kNN classifier performs better than the original k-NN algorithm in terms of accuracy, detection rate, false positive, and attacks classification. The authors mainly compare DS-kNN to CANN, but also to SVM, S-NDAE, and DBN. The obtained results also show that the approach is very competitive.

Download Full-text

Identifying buzz in social media: a hybrid approach using artificial bee colony and k-nearest neighbors for outlier detection

Social Network Analysis and Mining ◽

10.1007/s13278-017-0461-2 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 16

Author(s):

Reema Aswani ◽

S. P. Ghrera ◽

Arpan Kumar Kar ◽

Satish Chandra

Keyword(s):

Social Media ◽

Outlier Detection ◽

Artificial Bee Colony ◽

Hybrid Approach ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Bee Colony

Download Full-text

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-21786-9_8 ◽

2011 ◽

pp. 36-42 ◽

Cited By ~ 4

Author(s):

Neminath Hubballi ◽

Bidyut Kr. Patra ◽

Sukumar Nandi

Keyword(s):

Outlier Detection ◽

Nearest Neighbor ◽

Detection Technique ◽

Neighbor Distance ◽

Nearest Neighbor Distance

Download Full-text

Analisis Perbandingan Algoritma Klasifikasi Citra Chest X-ray Untuk Deteksi Covid-19

Teknika ◽

10.34148/teknika.v10i2.331 ◽

2021 ◽

Vol 10 (2) ◽

pp. 96-103

Author(s):

Mohammad Farid Naufal ◽

Selvia Ferdiana Kusuma ◽

Kevin Christian Tanus ◽

Raynaldy Valentino Sukiwun ◽

Joseph Kristiano ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Cross Validation ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

X Ray ◽

Chest X Ray

Kondisi pandemi global Covid-19 yang muncul diakhir tahun 2019 telah menjadi permasalahan utama seluruh negara di dunia. Covid-19 merupakan virus yang menyerang organ paru-paru dan dapat mengakibatkan kematian. Pasien Covid-19 banyak yang telah dirawat di rumah sakit sehingga terdapat data citra chest X-ray paru-paru pasien yang terjangkit Covid-19. Saat ini sudah banyak peneltian yang melakukan klasifikasi citra chest X-ray menggunakan Convolutional Neural Network (CNN) untuk membedakan paru-paru sehat, terinfeksi covid-19, dan penyakit paru-paru lainnya, namun belum ada penelitian yang mencoba membandingkan performa algoritma CNN dan machine learning klasik seperti Support Vector Machine (SVM), dan K-Nearest Neighbor (KNN) untuk mengetahui gap performa dan waktu eksekusi yang dibutuhkan. Penelitian ini bertujuan untuk membandingkan performa dan waktu eksekusi algoritma klasifikasi K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan CNN untuk mendeteksi Covid-19 berdasarkan citra chest X-Ray. Berdasarkan hasil pengujian menggunakan 5 Cross Validation, CNN merupakan algoritma yang memiliki rata-rata performa terbaik yaitu akurasi 0,9591, precision 0,9592, recall 0,9591, dan F1 Score 0,959 dengan waktu eksekusi rata-rata sebesar 3102,562 detik.

Download Full-text

Methods based on k-nearest neighbor regression in the prediction of basal area diameter distribution

Canadian Journal of Forest Research ◽

10.1139/x98-085 ◽

1998 ◽

Vol 28 (8) ◽

pp. 1107-1115 ◽

Cited By ~ 61

Author(s):

Matti Maltamo ◽

Annika Kangas

Keyword(s):

Nearest Neighbor ◽

Basal Area ◽

Nearest Neighbors ◽

Volume Estimation ◽

Diameter Distribution ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Stand Growth ◽

Weighted Averages ◽

Growing Stock

In the Finnish compartmentwise inventory systems, growing stock is described with means and sums of tree characteristics, such as mean height and basal area, by tree species. In the calculations, growing stock is described in a treewise manner using a diameter distribution predicted from stand variables. The treewise description is needed for several reasons, e.g., for predicting log volumes or stand growth and for analyzing the forest structure. In this study, methods for predicting the basal area diameter distribution based on the k-nearest neighbor (k-nn) regression are compared with methods based on parametric distributions. In the k-nn method, the predicted values for interesting variables are obtained as weighted averages of the values of neighboring observations. Using k-nn based methods, the basal area diameter distribution of a stand is predicted with a weighted average of the distributions of k-nearest neighbors. The methods tested in this study include weighted averages of (i)Weibull distributions of k-nearest neighbors, (ii)distributions of k-nearest neighbors smoothed with the kernel method, and (iii)empirical distributions of the k-nearest neighbors. These methods are compared for the accuracy of stand volume estimation, stand structure description, and stand growth prediction. Methods based on the k-nn regression proved to give a more accurate description of the stand than the parametric methods.

Download Full-text

Klasifikasi Sekolah Menengah Pertama/Sederajat Wilayah Bireuen Menggunakan Algoritma K-Nearest Neighbors Berbasis Web

Computer Engineering Science and System Journal ◽

10.24114/cess.v5i1.14962 ◽

2020 ◽

Vol 5 (1) ◽

pp. 33

Author(s):

Rozzi Kesuma Dinata ◽

Fajriana Fajriana ◽

Zulfa Zulfa ◽

Novia Hasdyna

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

K Nearest Neighbor ◽

K Nearest Neighbors

Pada penelitian ini diimplementasikan algoritma K-Nearest Neighbor dalam pengklasifikasian Sekolah Menengah Pertama/Sederajat berdasarkan peminatan calon siswa. Tujuan penelitian ini adalah untuk memudahkan pengguna dalam menemukan sekolah SMP/sederajat berdasarkan 8 kriteria sekolah yaitu akreditasi, fasilitas ruangan, fasilitas olah raga, laboratorium, ekstrakulikuler, biaya, tingkatan kelas dan waktu belajar. Adapun data yang digunakan dalam penelitian ini didapatkan dari Dinas Pendidikan Pemuda dan Olahraga Kabupaten Bireuen. Hasil penelitian dengan menggunakan K-NN dan pendekatan Euclidean Distance dengan k=3, diperoleh nilai precision sebesar 63,67%, recall 68,95% dan accuracy sebesar 79,33% .

Download Full-text

PREDIKSI KELULUSAN MAHASISWA MAGISTER TEKNIK INFORMATIKA UNIVERSITAS AMIKOM YOGYAKARTA MENGGUNAKAN METODE K-NEAREST NEIGHBOR

Respati ◽

10.35842/jtir.v13i2.260 ◽

2018 ◽

Vol 13 (2) ◽

Author(s):

Eri Sasmita Susanto ◽

Kusrini Kusrini ◽

Hanif Al Fatta

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

K Nearest Neighbor ◽

Process Data ◽

K Nearest Neighbors ◽

Testing Data ◽

Estimation Scheme ◽

Student Graduation ◽

Feasibility Test

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data

Download Full-text