Comparison of Distance Models on K-Nearest Neighbor Algorithm in Stroke Disease Detection

Stroke is a cardiovascular (CVD) disease caused by the failure of brain cells to get oxygen supply to pose a risk of ischemic damage and result in death. This Disease can detect based on the similarity of symptoms experienced by the sufferer so that early steps can be taking with appropriate counseling and treatment. Stroke detecting requires a machine learning method. In this research, the author used one of the supervised learning classification methods, namely K-Nearest Neighbor (K-NN). K-NN is a classification method based on calculating the distance to training data. This research compares the Euclidean, Minkowski, Manhattan, Chebyshev distance models to obtain optimal results. The distance models have been tested using the stroke dataset sourced from the Kaggle repository. Based on the test results, the Chebyshev model has the highest levels of accuracy compared to the other three distance models with an average accuracy value of 95.49%, the highest accuracy of 96.03%, at K = 10. The Euclidean and Minkowski distance models have the same level of accuracy at each K value with an average accuracy value of 95.45%, the highest accuracy of 95.93% at K = 10. Meanwhile, Manhattan has the lowest average compared to the other distance models, which is 95.42% but has the highest accuracy of 96.03% at the value of K = 6

Download Full-text

Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13648 ◽

2020 ◽

Vol 8 (3) ◽

pp. 246-254

Author(s):

Agus Subhan Akbar ◽

R. Hadapiningradja Kusumodestoni

Keyword(s):

Nearest Neighbor ◽

Business Management ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Value ◽

Sample Data ◽

K Nearest Neighbor Algorithm ◽

Occupancy Rates ◽

Fold Cross Validation

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.

Download Full-text

PREDIKSI KELANCARAN PEMBAYARAN CICILAN CALON DEBITUR DENGAN METODE K-NEAREST NEIGHBOR

JURTEKSI ◽

10.33330/jurteksi.v7i2.1078 ◽

2021 ◽

Vol 7 (2) ◽

pp. 195-202

Author(s):

Sri Ayu Rizky ◽

Rolly Yesputra ◽

Santoso Santoso

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Training Data ◽

Mining Method ◽

K Nearest Neighbor ◽

Application System ◽

K Value ◽

Testing Data ◽

Calculation Process ◽

K Nearest Neighbor Algorithm

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower. Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers; Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9. Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor

Download Full-text

IDENTIFICATION OF HOAX BASED ON TEXT MINING USING K-NEAREST NEIGHBOR METHOD

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i02.p04 ◽

2022 ◽

Vol 10 (2) ◽

pp. 217

Author(s):

I Wayan Santiyasa ◽

Gede Putra Aditya Brahmantha ◽

I Wayan Supriana ◽

I GA Gede Arya Kadyanan ◽

I Ketut Gede Suhartana ◽

...

Keyword(s):

Nearest Neighbor ◽

The Internet ◽

Test Results ◽

K Nearest Neighbor ◽

K Value ◽

The Public ◽

A Value ◽

K Nearest Neighbor Algorithm ◽

Time Information ◽

Fold Cross Validation

At this time, information is very easy to obtain, information can spread quickly to all corners of society. However, the information that spreaded are not all true, there is false information or what is commonly called hoax which of course is also easily spread by the public, the public only thinks that all the information circulating on the internet is true. From every news published on the internet, it cannot be known directly that the news is a hoax or valid one. The test uses 740 random contents / issue data that has been verified by an institution, where 370 contents are hoaxes and 370 contents are valid. The test uses the K-Nearest Neighbor algorithm, before the classification process is performed, the preprocessing stage is performed first and uses the TF-IDF equation to get the weight of each feature, then classified using K-Nearest Neighbor and the test results is evaluated using 10-Fold Cross Validation. The test uses the k value with a value of 2 to 10. The optimal use of the k value in the implementation is obtained at a value of k = 4 with precision, recall, and F-Measure results of 0.764856, 0.757583, and 0.751944 respectively and an accuracy of 75.4%

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Business Intelligence using the K-Nearest Neighbor Algorithm to Analyze Customer Behavior in Online Crowdfunding Systems

E3S Web of Conferences ◽

10.1051/e3sconf/202020216005 ◽

2020 ◽

Vol 202 ◽

pp. 16005

Author(s):

Chashif Syadzali ◽

Suryono Suryono ◽

Jatmiko Endro Suseno

Keyword(s):

Business Intelligence ◽

Nearest Neighbor ◽

Customer Behavior ◽

Training Data ◽

Business Strategies ◽

Intelligence Analysis ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Customer behavior classification can be useful to assist companies in conducting business intelligence analysis. Data mining techniques can classify customer behavior using the K-Nearest Neighbor algorithm based on the customer's life cycle consisting of prospect, responder, active and former. Data used to classify include age, gender, number of donations, donation retention and number of user visits. The calculation results from 2,114 data in the classification of each customer’s category are namely active by 1.18%, prospect by 8.99%, responder by 4.26% and former by 85.57%. System accuracy using a range of K from K = 1 to K = 20 produces that the highest accuracy is 94.3731% at a value of K = 4. The results of the training data that produce a classification of user behavior can be used as a Business Intelligence analysis that is useful for companies in determining business strategies by knowing the target of optimal market.

Download Full-text

Tone Classification Matches Kodàly Handsign with the K-Nearest Neighbor Method at Leap Motion Controller

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.2019.52.283 ◽

2020 ◽

Vol 5 (2) ◽

pp. 40

Author(s):

Muhammad Croassacipto ◽

Muhammad Ichwan ◽

Dina Budhi Utami

Keyword(s):

Music Education ◽

Nearest Neighbor ◽

Human Interaction ◽

Training Data ◽

Leap Motion ◽

K Nearest Neighbor ◽

Motion Controller ◽

K Value ◽

Leap Motion Controller ◽

Natural Function

<p>Hands can produce a variety of poses in which each pose can have a meaning or purpose that can be used as a form of communication determined according to a general agreement or who communicate. Hand pose can be used as human interaction with the computer is faster, intuitive, and in line with the natural function of the human body called Handsign. One of them is Kodàly Handsign, made by a Hungarian composer named Zoltán Kodály, which is a concept in music education in Hungary. This hand sign is used in interactive angklung performances in determining the tone that will be played by the K-Nearest Neighbor (KNN) algorithm classification process based on hand poses. This classification process is performed on the extracted data from Leap Motion Controller, which takes Pitch, Roll, and Yaw values based on basic aircraft principle. The results of the research were conducted five times with the value of k periodically 1,3,5,7,9 with test data consisting pose of 874 Do', 702 Si, 913 La, 612 Sol, 661 Fa, 526 Mi, 891 Re, and 1004 Do punctuation on 21099 training data. The test results can recognize hand poses with the optimal k value k=1 with an accuracy level of 94.87%.</p>

Download Full-text

Application of K-Nearest Neighbor Algorithm on Classification of Disk Hernia and Spondylolisthesis in Vertebral Column

Indonesian Journal of Information Systems ◽

10.24002/ijis.v2i1.2352 ◽

2019 ◽

Vol 2 (1) ◽

pp. 57 ◽

Cited By ~ 1

Author(s):

Irma Handayani

Keyword(s):

Vertebral Column ◽

Nearest Neighbor ◽

Average Length ◽

Data Classification ◽

The Body ◽

Training Data ◽

K Nearest Neighbor ◽

Sample Data ◽

K Nearest Neighbor Algorithm

Vertebral column as a part of backbone has important role in human body. Trauma in vertebral column can affect spinal cord capability to send and receive messages from brain to the body system that controls sensory and motoric movement. Disk hernia and spondylolisthesis are examples of pathologies on the vertebral column. Research about pathology or damage bones and joints of skeletal system classification is rare whereas the classification system can be used by radiologists as a second opinion so that can improve productivity and diagnosis consistency of the radiologists. This research used dataset Vertebral Column that has three classes (Disk Hernia, Spondylolisthesis and Normal) and instances in UCI Machine Learning. This research applied the K-NN algorithm for classification of disk hernia and spondylolisthesis in vertebral column. The data were then classified into two different but related classification tasks: “normal” and “abnormal”. K-NN algorithm adopts the approach of data classification by optimizing sample data that can be used as a reference for training data to produce vertebral column data classification based on the learning process. The results showed that the accuracy of K-NN classifier was 83%. The average length of time needed to classify the K-NN classifier was 0.000212303 seconds.

Download Full-text

Analysis K-Nearest Neighbor Algorithm for Improving Prediction Student Graduation Time

SinkrOn ◽

10.33395/sinkron.v4i2.10480 ◽

2020 ◽

Vol 4 (2) ◽

pp. 42

Author(s):

Rizki Muliono ◽

Juanda Hakim Lubis ◽

Nurul Khairina

Keyword(s):

Higher Education ◽

Nearest Neighbor ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Study Program ◽

Sample Data ◽

Student Graduation ◽

K Nearest Neighbor Algorithm

Higher education plays a major role in improving the quality of education in Indonesia. The BAN-PT institution established by the government has a standard of higher education accreditation and study program accreditation. With the 4.0-based accreditation instrument, it encourages university leaders to improve the quality and quality of their education. One indicator that determines the accreditation of study programs is the timely graduation of students. This study uses the K-Nearest Neighbor algorithm to predict student graduation times. Students' GPA at the time of the seventh semester will be used as training data, and data of students who graduate are used as sample data. K-Nearest Neighbor works in accordance with the given sample data. The results of prediction testing on 60 data for students of 2015-2016, obtained the highest level of accuracy of 98.5% can be achieved when k = 3. Prediction results depend on the pattern of data entered, the more samples and training data used, the calculation of the K-Nearest Neighbor algorithm is also more accurate.

Download Full-text

Feature Selection and K-nearest Neighbor for Diagnosis Cow Disease

International journal of science, engineering, and information technology ◽

10.21107/ijseit.v5i02.10218 ◽

2021 ◽

Vol 5 (02) ◽

pp. 249-253

Author(s):

Yeni Kustiyahningsih

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Disease Classification ◽

Training Data ◽

Test Results ◽

K Nearest Neighbor ◽

Data Set ◽

Cattle Disease ◽

Cattle Diseases ◽

Cattle Breeders

The large number of cattle population that exists can increase the potential for developing cow disease. Lack of knowledge about various kinds of cattle diseases and their handling solutions is one of the causes of decreasing cow productivity. The aim of this research is to classify cattle disease quickly and accurately to assist cattle breeders in accelerating detection and handling of cattle disease. This study uses K-Nearest Neighbour (KNN) classification method with the F-Score feature selection. The KNN method is used for disease classification based on the distance between training data and test data, while F-Score feature selection is used to reduce the attribute dimensions in order to obtain the relevant attributes. The data set used was data on cattle disease in Madura with a total of 350 data consisting of 21 features and 7 classes. Data were broken down using K-fold Cross Validation using k = 5. Based on the test results, the best accuracy was obtained with the number of features = 18 and KNN (k = 3) which resulted in an accuracy of 94.28571, a recall of 0.942857 and a precision of 0.942857.

Download Full-text

Penyelesaian Masalah Pengelolaan Lumbung Pangan Desa Menggunakan Case-Based Reasoning dengan Algoritma K-Nearest Neighbor

JSI: Jurnal Sistem Informasi (E-Journal) ◽

10.36706/jsi.v11i1.7699 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Mgs. Afriyan Firdaus ◽

Dwi Rosa Indah ◽

Putri Eka Sevtiyuni ◽

Choirunnisa Qonitah

Keyword(s):

Problem Solving ◽

Nearest Neighbor ◽

Technical Problem ◽

Case Based Reasoning ◽

Test Results ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Existing Problems ◽

K Nearest Neighbor Algorithm ◽

Case Based

In this paper, we discuss the problem solving of village food barn management using Case-Based Reasoning (CBR) with the K-Nearest Neighbor algorithm. This research was carried out by adopting the stages of the CBR cycle and the nearest neighbor algorithm. The results of the study show that the application of CBR and K-nearest neighbor algorithms can support the resolution of knowledge problems in village food barn management using technical problem solving based on the symptoms and solutions to existing problems. Based on the test results, the problem-solving accuracy was 92%.Keywords - case-based reasoning, K-nearest neighbor, food barn, problem-solving

Download Full-text