IDENTIFICATION OF HOAX BASED ON TEXT MINING USING K-NEAREST NEIGHBOR METHOD

At this time, information is very easy to obtain, information can spread quickly to all corners of society. However, the information that spreaded are not all true, there is false information or what is commonly called hoax which of course is also easily spread by the public, the public only thinks that all the information circulating on the internet is true. From every news published on the internet, it cannot be known directly that the news is a hoax or valid one. The test uses 740 random contents / issue data that has been verified by an institution, where 370 contents are hoaxes and 370 contents are valid. The test uses the K-Nearest Neighbor algorithm, before the classification process is performed, the preprocessing stage is performed first and uses the TF-IDF equation to get the weight of each feature, then classified using K-Nearest Neighbor and the test results is evaluated using 10-Fold Cross Validation. The test uses the k value with a value of 2 to 10. The optimal use of the k value in the implementation is obtained at a value of k = 4 with precision, recall, and F-Measure results of 0.764856, 0.757583, and 0.751944 respectively and an accuracy of 75.4%

Download Full-text

Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13648 ◽

2020 ◽

Vol 8 (3) ◽

pp. 246-254

Author(s):

Agus Subhan Akbar ◽

R. Hadapiningradja Kusumodestoni

Keyword(s):

Nearest Neighbor ◽

Business Management ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Value ◽

Sample Data ◽

K Nearest Neighbor Algorithm ◽

Occupancy Rates ◽

Fold Cross Validation

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.

Download Full-text

Aplikasi Prediksi Kelulusan Mahasiswa Berbasis K-Nearest Neighbor (K-NN)

JTIM : Jurnal Teknologi Informasi dan Multimedia ◽

10.35746/jtim.v1i1.11 ◽

2019 ◽

Vol 1 (1) ◽

pp. 30-36 ◽

Cited By ~ 1

Author(s):

Lalu Abd Rahman Hakim ◽

Ahmad Ashril Rizal ◽

Dwi Ratnasari

Keyword(s):

Nearest Neighbor ◽

Educational Institution ◽

Confusion Matrix ◽

K Nearest Neighbor ◽

Study Program ◽

K Value ◽

Student Graduation ◽

K Nearest Neighbor Algorithm ◽

Communication Planning ◽

Fold Cross Validation

Students are important assets for an educational institution and for this reason, it is necessary to pay attention to the student's graduation rate on time. Presentation of the ups and downs of students' ability to complete their studies on time is one of the elements of campus accreditation assessment. Based on data from the Study Program Section in the last 3 years the student graduation presentation is only 25% of the total students who can complete their studies on time. In this study using the K-Nearest Neighbor algorithm which aims to be able to identify student graduation in new cases by adapting solutions from previous cases that have closeness to new cases. This algorithm has the role to get the value of the closeness of the new case to the old case, which in turn the most population in area K with the closest value obtained by the student is predicted whether to pass on time or not on time. This study uses Roger S. Pressman's waterfalll method, namely Communication, Planning, Modeling, and Construction. Based on the tests carried out using K-Fold Cross Validation, the highest accuracy in the third model was 80% when folded 4th and 61% when the K value = 1. While testing using the Confusion Matrix obtained the highest accuracy of 98% at K = 1 for classification "Timely", and 98% at K = 2 for classification "Not Timely"

Download Full-text

Comparison of Distance Models on K-Nearest Neighbor Algorithm in Stroke Disease Detection

Applied Technology and Computing Science Journal ◽

10.33086/atcsj.v4i1.2097 ◽

2021 ◽

Vol 4 (1) ◽

pp. 63-68

Author(s):

Iswanto Iswanto ◽

Tulus Tulus ◽

Poltak Sihombing

Keyword(s):

Nearest Neighbor ◽

The Other ◽

Training Data ◽

Machine Learning Method ◽

Test Results ◽

K Nearest Neighbor ◽

Minkowski Distance ◽

K Value ◽

Average Accuracy ◽

K Nearest Neighbor Algorithm

Stroke is a cardiovascular (CVD) disease caused by the failure of brain cells to get oxygen supply to pose a risk of ischemic damage and result in death. This Disease can detect based on the similarity of symptoms experienced by the sufferer so that early steps can be taking with appropriate counseling and treatment. Stroke detecting requires a machine learning method. In this research, the author used one of the supervised learning classification methods, namely K-Nearest Neighbor (K-NN). K-NN is a classification method based on calculating the distance to training data. This research compares the Euclidean, Minkowski, Manhattan, Chebyshev distance models to obtain optimal results. The distance models have been tested using the stroke dataset sourced from the Kaggle repository. Based on the test results, the Chebyshev model has the highest levels of accuracy compared to the other three distance models with an average accuracy value of 95.49%, the highest accuracy of 96.03%, at K = 10. The Euclidean and Minkowski distance models have the same level of accuracy at each K value with an average accuracy value of 95.45%, the highest accuracy of 95.93% at K = 10. Meanwhile, Manhattan has the lowest average compared to the other distance models, which is 95.42% but has the highest accuracy of 96.03% at the value of K = 6

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Automatic classification of insulator by combining k-nearest neighbor algorithm with multi-type feature for the Internet of Things

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-018-1195-1 ◽

2018 ◽

Vol 2018 (1) ◽

Cited By ~ 1

Author(s):

Guoxiong Hu ◽

Zhong Yang ◽

Maohu Zhu ◽

Li Huang ◽

Naixue Xiong

Keyword(s):

Internet Of Things ◽

Nearest Neighbor ◽

Automatic Classification ◽

The Internet ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

The Internet Of Things

Download Full-text

PUBLIC SENTIMENT ANALYSIS OF PASAR LAMA TANGERANG USING K-NEAREST NEIGHBOR METHOD AND PROGRAMMING LANGUAGE R

Jurnal Ilmiah Informatika Komputer ◽

10.35760/ik.2019.v24i2.2367 ◽

2019 ◽

Vol 24 (2) ◽

pp. 129-133

Author(s):

Hustinawaty ◽

Rama Al Azis Dwiputra ◽

Tavipia Rumambi

Keyword(s):

Sentiment Analysis ◽

Programming Languages ◽

Nearest Neighbor ◽

Tourist Attraction ◽

K Nearest Neighbor ◽

The Public ◽

Public Sentiment ◽

A Value ◽

Negative Comments ◽

The City

Pasar Lama Tangerang is a tourist attraction in the city of Tangerang. With the development of current technology, the public can provide an overview of how the facilities and services are provided by expressing opinions on the internet. However, it is difficult to distinguish which opinions belong to positive or negative opinions. Sentiment analysis is needed to overcome this problem. The stage in sentiment analysis starts with collecting data first, then the data is processed. Furthermore, the data that has been propagated is given a sentiment classification using the K-Nearest Neighbor (KNN) algorithm. Then the classification results obtained an accuracy of 83% with a value of k = 1 of 120 data divided by 92 positive and 28 negative comments. Sentiment analysis is made using the R and Rstudio programming languages as supporting software.

Download Full-text

Penyelesaian Masalah Pengelolaan Lumbung Pangan Desa Menggunakan Case-Based Reasoning dengan Algoritma K-Nearest Neighbor

JSI: Jurnal Sistem Informasi (E-Journal) ◽

10.36706/jsi.v11i1.7699 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Mgs. Afriyan Firdaus ◽

Dwi Rosa Indah ◽

Putri Eka Sevtiyuni ◽

Choirunnisa Qonitah

Keyword(s):

Problem Solving ◽

Nearest Neighbor ◽

Technical Problem ◽

Case Based Reasoning ◽

Test Results ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Existing Problems ◽

K Nearest Neighbor Algorithm ◽

Case Based

In this paper, we discuss the problem solving of village food barn management using Case-Based Reasoning (CBR) with the K-Nearest Neighbor algorithm. This research was carried out by adopting the stages of the CBR cycle and the nearest neighbor algorithm. The results of the study show that the application of CBR and K-nearest neighbor algorithms can support the resolution of knowledge problems in village food barn management using technical problem solving based on the symptoms and solutions to existing problems. Based on the test results, the problem-solving accuracy was 92%.Keywords - case-based reasoning, K-nearest neighbor, food barn, problem-solving

Download Full-text

PREDIKSI HASIL PEMILU LEGISLATIF MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBOR BERBASIS BACKWARD ELIMINATION

Jurnal RESISTOR (Rekayasa Sistem Komputer) ◽

10.31598/jurnalresistor.v3i1.517 ◽

2020 ◽

Vol 3 (1) ◽

pp. 27-41

Author(s):

Achmad Saiful Rizal ◽

Moch. Lutfi

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Political Elite ◽

Data Mining Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Backward Elimination ◽

K Nearest Neighbor Algorithm ◽

Fold Cross Validation ◽

Selection Of

Elections in Indonesia from period to period have undergone some changes. Elections legislative candidates not determined voters, but instead became a political elite authority in accordance with the order of the list of legislative candidates and their number sequence. To perform a prediction one of them with data mining. Data mining can be applied in the political sphere for example to predict the results of the legislative election and others. K-nearest neighbor algorithm is one of the data mining algorithm that performs classification based on learning object against which are closest to the object. Election-related research has been done with the k-nearest neighbor algorithm, but accuracy is obtained that method is still too low, so it takes an additional algorithm to improve accuracy. In this study, the proposed method, namely the method of k-nearest neighbor method combined with backward elimination as a selection of features. The dataset that will be used in the study comes from the KPU Sidoarjo that has special attributes 1 and 13 regular attributes. From the results of the analysis and computation of some methods, it can be concluded that the method of k-nearest neighbor method combined with backward elimination produced some conclusions. First, of the 14 attributes in the dataset, retrieved 8 most influential attribute. Second, the best accuracy are of 96.03% when k = 2 and tested by 10 fold cross validation.

Download Full-text

Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i6.3547 ◽

2021 ◽

Vol 5 (6) ◽

pp. 1083-1089

Author(s):

Nur Ghaniaviyanto Ramadhan

Keyword(s):

Nearest Neighbor ◽

Online News ◽

Classification Model ◽

Support Vector ◽

The Internet ◽

K Nearest Neighbor ◽

K Value ◽

Random Forest Classification ◽

Forest Classification ◽

Survey Results

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.

Download Full-text

Classification of Radical Web Content in Indonesia using Web Content Mining and k-Nearest Neighbor Algorithm

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v5i2.214 ◽

2018 ◽

Vol 5 (2) ◽

pp. 328-348

Author(s):

Muh Subhan ◽

Amang Sudarsono ◽

Ali Ridho Barakbah

Keyword(s):

Text Mining ◽

Nearest Neighbor ◽

Experimental Result ◽

Web Content ◽

K Nearest Neighbor ◽

K Value ◽

Content Mining ◽

Result Show ◽

K Nearest Neighbor Algorithm ◽

Radical Content

Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].

Download Full-text