Classification of Radical Web Content in Indonesia using Web Content Mining and k-Nearest Neighbor Algorithm

Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].

Download Full-text

Twitter text mining for sentiment analysis on government’s response to forest fires with vader lexicon polarity detection and k-nearest neighbor algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1567/3/032024 ◽

2020 ◽

Vol 1567 ◽

pp. 032024

Author(s):

T Mustaqim ◽

K Umam ◽

M A Muslim

Keyword(s):

Text Mining ◽

Sentiment Analysis ◽

Forest Fires ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Collaborative filtering recommendation algorithm based on class correlation distance

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191116144822 ◽

2019 ◽

Vol 13 ◽

Author(s):

Hanfei Zhang ◽

Yumei Jian ◽

Ping Zhou

Keyword(s):

Collaborative Filtering ◽

Nearest Neighbor ◽

Experimental Result ◽

K Nearest Neighbor ◽

Recommendation Algorithm ◽

K Value ◽

Correlation Distance ◽

Training Samples ◽

The Difference ◽

Better Than

: A class correlation distance collaborative filtering recommendation algorithm is proposed to solve the problems of category judgment and distance metric in the traditional collaborative filtering recommendation algorithm, which is using the advantage of the distance between the same samples and the class related distance. First, the class correlation distance between the training samples is calculated and stored. Second, the K nearest neighbor samples are selected, the class correlation distance of training samples and the difference ratio between the test samples and training samples are calculated respectively. Finally, according to the difference ratio, we classify the different types of samples. The experimental result shows that the algorithm combined with user rating preference can get lower MAE value, and the recommendation effect is better. With the change of K value, CCDKNN algorithm is obviously better than KNN algorithm and DWKNN algorithm, and the accuracy performance is more stable. The algorithm improves the accuracy of similarity and predictability, which has better performance than the traditional algorithm.

Download Full-text

K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.35940/ijainn.b1019.041221 ◽

2021 ◽

Vol 1 (2) ◽

pp. 18-21

Author(s):

Tsehay Admassu Assegie*

Keyword(s):

Nearest Neighbor ◽

Attack Detection ◽

Experimental Result ◽

Data Repository ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

K Value ◽

Proposed Model ◽

Public Data ◽

Public Data Repository

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.

Download Full-text

K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.54105/ijainn.b1019.041221 ◽

2021 ◽

pp. 18-21

Author(s):

Tsehay Admassu Assegie ◽

Keyword(s):

Nearest Neighbor ◽

Attack Detection ◽

Experimental Result ◽

Data Repository ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

K Value ◽

Proposed Model ◽

Public Data ◽

Public Data Repository

Download Full-text

Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13648 ◽

2020 ◽

Vol 8 (3) ◽

pp. 246-254

Author(s):

Agus Subhan Akbar ◽

R. Hadapiningradja Kusumodestoni

Keyword(s):

Nearest Neighbor ◽

Business Management ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Value ◽

Sample Data ◽

K Nearest Neighbor Algorithm ◽

Occupancy Rates ◽

Fold Cross Validation

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.

Download Full-text

PREDIKSI KELANCARAN PEMBAYARAN CICILAN CALON DEBITUR DENGAN METODE K-NEAREST NEIGHBOR

JURTEKSI ◽

10.33330/jurteksi.v7i2.1078 ◽

2021 ◽

Vol 7 (2) ◽

pp. 195-202

Author(s):

Sri Ayu Rizky ◽

Rolly Yesputra ◽

Santoso Santoso

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Training Data ◽

Mining Method ◽

K Nearest Neighbor ◽

Application System ◽

K Value ◽

Testing Data ◽

Calculation Process ◽

K Nearest Neighbor Algorithm

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower. Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers; Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9. Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor

Download Full-text

Aplikasi Prediksi Kelulusan Mahasiswa Berbasis K-Nearest Neighbor (K-NN)

JTIM : Jurnal Teknologi Informasi dan Multimedia ◽

10.35746/jtim.v1i1.11 ◽

2019 ◽

Vol 1 (1) ◽

pp. 30-36 ◽

Cited By ~ 1

Author(s):

Lalu Abd Rahman Hakim ◽

Ahmad Ashril Rizal ◽

Dwi Ratnasari

Keyword(s):

Nearest Neighbor ◽

Educational Institution ◽

Confusion Matrix ◽

K Nearest Neighbor ◽

Study Program ◽

K Value ◽

Student Graduation ◽

K Nearest Neighbor Algorithm ◽

Communication Planning ◽

Fold Cross Validation

Students are important assets for an educational institution and for this reason, it is necessary to pay attention to the student's graduation rate on time. Presentation of the ups and downs of students' ability to complete their studies on time is one of the elements of campus accreditation assessment. Based on data from the Study Program Section in the last 3 years the student graduation presentation is only 25% of the total students who can complete their studies on time. In this study using the K-Nearest Neighbor algorithm which aims to be able to identify student graduation in new cases by adapting solutions from previous cases that have closeness to new cases. This algorithm has the role to get the value of the closeness of the new case to the old case, which in turn the most population in area K with the closest value obtained by the student is predicted whether to pass on time or not on time. This study uses Roger S. Pressman's waterfalll method, namely Communication, Planning, Modeling, and Construction. Based on the tests carried out using K-Fold Cross Validation, the highest accuracy in the third model was 80% when folded 4th and 61% when the K value = 1. While testing using the Confusion Matrix obtained the highest accuracy of 98% at K = 1 for classification "Timely", and 98% at K = 2 for classification "Not Timely"

Download Full-text

PENERAPAN TEXT MINING DALAM MENGANALISIS KEPRIBADIAN PENGGUNA MEDIA SOSIAL

Jurnal Teknik Informatika Musirawas (JUTIM) ◽

10.32767/jutim.v5i1.791 ◽

2020 ◽

Vol 5 (1) ◽

pp. 63-71

Author(s):

M. Pramadani Riyanis Putra ◽

Kiki Rizky Nova Wardani

Keyword(s):

Text Mining ◽

Big Five ◽

Nearest Neighbor ◽

Quality Information ◽

Big Five Personality ◽

Classification Methods ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

High Quality Information ◽

K Nearest Neighbor Algorithm

Facebook”is a social networking application where users reveal a lot about the mselves”through their posting pages. So the writer wants”to know what information can be taken about the user's”personality. Data mining plays an important role”which aims to”transform raw data into a structure that can be understood”for further use. ”Text mining refers to the process of”retrieving”high quality information from text”,one of the classification methods that can be used is the K-Nearest Neighbor algorithm. Based on the theory of big five personality”the results of the study concluded that”the accuracy rate obtained was 92.92%, from 550 data with the highest openness personality character value of 239, Conscientiouseness of 16 data, Extraversion of 173 data, Agreeableness of 50 data, Neuroticism of 33 data and 39 data that cannot be classified.

Download Full-text

Klasifikasi Jenis Laporan Masyarakat Dengan K-Nearest Neighbor Algorithm

JOINS (Journal of Information System) ◽

10.33633/joins.v5i1.3355 ◽

2020 ◽

Vol 5 (1) ◽

pp. 77-85

Author(s):

Heru Pramono Hadi ◽

Titien S. Sukamto

Keyword(s):

Text Mining ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Feedback masyarakat terhadap pelayanan pemerintah merupakan elemen penting dalam proses evaluasi dan peningkatan kinerja. Maka dari itu pemerintah perlu untuk memiliki metode pelaporan yang efektif, efisien dan sistematis. Feedback masyarakat dapat berupa pengaduan, permintaan informasi dan aspirasi. Salah satu cara penyampain feedback masyarakat adalah melalui media sosial. Klasifikasi jenis laporan/feedback masyarakat ini penting dilakukan untuk mempercepat proses penanggapan laporan. Algoritma K-Nearest neighbor pada metode text mining ini merupakan salah satu solusi untuk dapat membantu proses klasifikasi jenis laporan. Dengan 930 data latih dan 100 data uji laporan masyarakat tahun 2017 yang disampaikan melalui media sosial, menghasilkan nilai akurasi tertinggi k=11 sebesar 82%.

Download Full-text

IDENTIFICATION OF HOAX BASED ON TEXT MINING USING K-NEAREST NEIGHBOR METHOD

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i02.p04 ◽

2022 ◽

Vol 10 (2) ◽

pp. 217

Author(s):

I Wayan Santiyasa ◽

Gede Putra Aditya Brahmantha ◽

I Wayan Supriana ◽

I GA Gede Arya Kadyanan ◽

I Ketut Gede Suhartana ◽

...

Keyword(s):

Nearest Neighbor ◽

The Internet ◽

Test Results ◽

K Nearest Neighbor ◽

K Value ◽

The Public ◽

A Value ◽

K Nearest Neighbor Algorithm ◽

Time Information ◽

Fold Cross Validation

At this time, information is very easy to obtain, information can spread quickly to all corners of society. However, the information that spreaded are not all true, there is false information or what is commonly called hoax which of course is also easily spread by the public, the public only thinks that all the information circulating on the internet is true. From every news published on the internet, it cannot be known directly that the news is a hoax or valid one. The test uses 740 random contents / issue data that has been verified by an institution, where 370 contents are hoaxes and 370 contents are valid. The test uses the K-Nearest Neighbor algorithm, before the classification process is performed, the preprocessing stage is performed first and uses the TF-IDF equation to get the weight of each feature, then classified using K-Nearest Neighbor and the test results is evaluated using 10-Fold Cross Validation. The test uses the k value with a value of 2 to 10. The optimal use of the k value in the implementation is obtained at a value of k = 4 with precision, recall, and F-Measure results of 0.764856, 0.757583, and 0.751944 respectively and an accuracy of 75.4%

Download Full-text