IDENTIFICATION OF HOAX BASED ON TEXT MINING USING K-NEAREST NEIGHBOR METHOD

2022 ◽  
Vol 10 (2) ◽  
pp. 217
Author(s):  
I Wayan Santiyasa ◽  
Gede Putra Aditya Brahmantha ◽  
I Wayan Supriana ◽  
I GA Gede Arya Kadyanan ◽  
I Ketut Gede Suhartana ◽  
...  

At this time, information is very easy to obtain, information can spread quickly to all corners of society. However, the information that spreaded are not all true, there is false information or what is commonly called hoax which of course is also easily spread by the public, the public only thinks that all the information circulating on the internet is true. From every news published on the internet, it cannot be known directly that the news is a hoax or valid one. The test uses 740 random contents / issue data that has been verified by an institution, where 370 contents are hoaxes and 370 contents are valid. The test uses the K-Nearest Neighbor algorithm, before the classification process is performed, the preprocessing stage is performed first and uses the TF-IDF equation to get the weight of each feature, then classified using K-Nearest Neighbor and the test results is evaluated using 10-Fold Cross Validation. The test uses the k value with a value of 2 to 10. The optimal use of the k value in the implementation is obtained at a value of k = 4 with precision, recall, and F-Measure results of 0.764856, 0.757583, and 0.751944 respectively and an accuracy of 75.4%

2020 ◽  
Vol 8 (3) ◽  
pp. 246-254
Author(s):  
Agus Subhan Akbar ◽  
R. Hadapiningradja Kusumodestoni

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.


2019 ◽  
Vol 1 (1) ◽  
pp. 30-36 ◽  
Author(s):  
Lalu Abd Rahman Hakim ◽  
Ahmad Ashril Rizal ◽  
Dwi Ratnasari

Students are important assets for an educational institution and for this reason, it is necessary to pay attention to the student's graduation rate on time. Presentation of the ups and downs of students' ability to complete their studies on time is one of the elements of campus accreditation assessment. Based on data from the Study Program Section in the last 3 years the student graduation presentation is only 25% of the total students who can complete their studies on time. In this study using the K-Nearest Neighbor algorithm which aims to be able to identify student graduation in new cases by adapting solutions from previous cases that have closeness to new cases. This algorithm has the role to get the value of the closeness of the new case to the old case, which in turn the most population in area K with the closest value obtained by the student is predicted whether to pass on time or not on time. This study uses Roger S. Pressman's waterfalll method, namely Communication, Planning, Modeling, and Construction. Based on the tests carried out using K-Fold Cross Validation, the highest accuracy in the third model was 80% when folded 4th and 61% when the K value = 1. While testing using the Confusion Matrix obtained the highest accuracy of 98% at K = 1 for classification "Timely", and 98% at K = 2 for classification "Not Timely"


2021 ◽  
Vol 4 (1) ◽  
pp. 63-68
Author(s):  
Iswanto Iswanto ◽  
Tulus Tulus ◽  
Poltak Sihombing

Stroke is a cardiovascular (CVD) disease caused by the failure of brain cells to get oxygen supply to pose a risk of ischemic damage and result in death. This Disease can detect based on the similarity of symptoms experienced by the sufferer so that early steps can be taking with appropriate counseling and treatment. Stroke detecting requires a machine learning method. In this research, the author used one of the supervised learning classification methods, namely K-Nearest Neighbor (K-NN). K-NN is a classification method based on calculating the distance to training data. This research compares the Euclidean, Minkowski, Manhattan, Chebyshev distance models to obtain optimal results. The distance models have been tested using the stroke dataset sourced from the Kaggle repository. Based on the test results, the Chebyshev model has the highest levels of accuracy compared to the other three distance models with an average accuracy value of 95.49%, the highest accuracy of 96.03%, at K = 10. The Euclidean and Minkowski distance models have the same level of accuracy at each K value with an average accuracy value of 95.45%, the highest accuracy of 95.93% at K = 10. Meanwhile, Manhattan has the lowest average compared to the other distance models, which is 95.42% but has the highest accuracy of 96.03% at the value of K = 6


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


2019 ◽  
Vol 24 (2) ◽  
pp. 129-133
Author(s):  
Hustinawaty ◽  
Rama Al Azis Dwiputra ◽  
Tavipia Rumambi

Pasar Lama Tangerang is a tourist attraction in the city of Tangerang. With the development of current technology, the public can provide an overview of how the facilities and services are provided by expressing opinions on the internet. However, it is difficult to distinguish which opinions belong to positive or negative opinions. Sentiment analysis is needed to overcome this problem. The stage in sentiment analysis starts with collecting data first, then the data is processed. Furthermore, the data that has been propagated is given a sentiment classification using the K-Nearest Neighbor (KNN) algorithm. Then the classification results obtained an accuracy of 83% with a value of k = 1 of 120 data divided by 92 positive and 28 negative comments. Sentiment analysis is made using the R and Rstudio programming languages as supporting software.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Mgs. Afriyan Firdaus ◽  
Dwi Rosa Indah ◽  
Putri Eka Sevtiyuni ◽  
Choirunnisa Qonitah

In this paper, we discuss the problem solving of village food barn management using Case-Based Reasoning (CBR) with the K-Nearest Neighbor algorithm. This research was carried out by adopting the stages of the CBR cycle and the nearest neighbor algorithm. The results of the study show that the application of CBR and K-nearest neighbor algorithms can support the resolution of knowledge problems in village food barn management using technical problem solving based on the symptoms and solutions to existing problems. Based on the test results, the problem-solving accuracy was 92%.Keywords - case-based reasoning, K-nearest neighbor, food barn, problem-solving


2020 ◽  
Vol 3 (1) ◽  
pp. 27-41
Author(s):  
Achmad Saiful Rizal ◽  
Moch. Lutfi

Elections in Indonesia from period to period have undergone some changes. Elections legislative candidates not determined voters, but instead became a political elite authority in accordance with the order of the list of legislative candidates and their number sequence. To perform a prediction one of them with data mining. Data mining can be applied in the political sphere for example to predict the results of the legislative election and others. K-nearest neighbor algorithm is one of the data mining algorithm that performs classification based on learning object against which are closest to the object. Election-related research has been done with the k-nearest neighbor algorithm, but accuracy is obtained that method is still too low, so it takes an additional algorithm to improve accuracy. In this study, the proposed method, namely the method of k-nearest neighbor method combined with backward elimination as a selection of features. The dataset that will be used in the study comes from the KPU Sidoarjo that has special attributes 1 and 13 regular attributes. From the results of the analysis and computation of some methods, it can be concluded that the method of k-nearest neighbor method combined with backward elimination produced some conclusions. First, of the 14 attributes in the dataset, retrieved 8 most influential attribute. Second, the best accuracy are of 96.03% when k = 2 and tested by 10 fold cross validation.


2021 ◽  
Vol 5 (6) ◽  
pp. 1083-1089
Author(s):  
Nur Ghaniaviyanto Ramadhan

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.  


2018 ◽  
Vol 5 (2) ◽  
pp. 328-348
Author(s):  
Muh Subhan ◽  
Amang Sudarsono ◽  
Ali Ridho Barakbah

Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].


Sign in / Sign up

Export Citation Format

Share Document