scholarly journals Classification of Radical Web Content in Indonesia using Web Content Mining and k-Nearest Neighbor Algorithm

2018 ◽  
Vol 5 (2) ◽  
pp. 328-348
Author(s):  
Muh Subhan ◽  
Amang Sudarsono ◽  
Ali Ridho Barakbah

Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].

Author(s):  
Hanfei Zhang ◽  
Yumei Jian ◽  
Ping Zhou

: A class correlation distance collaborative filtering recommendation algorithm is proposed to solve the problems of category judgment and distance metric in the traditional collaborative filtering recommendation algorithm, which is using the advantage of the distance between the same samples and the class related distance. First, the class correlation distance between the training samples is calculated and stored. Second, the K nearest neighbor samples are selected, the class correlation distance of training samples and the difference ratio between the test samples and training samples are calculated respectively. Finally, according to the difference ratio, we classify the different types of samples. The experimental result shows that the algorithm combined with user rating preference can get lower MAE value, and the recommendation effect is better. With the change of K value, CCDKNN algorithm is obviously better than KNN algorithm and DWKNN algorithm, and the accuracy performance is more stable. The algorithm improves the accuracy of similarity and predictability, which has better performance than the traditional algorithm.


Author(s):  
Tsehay Admassu Assegie*

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


Author(s):  
Tsehay Admassu Assegie ◽  

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


2020 ◽  
Vol 8 (3) ◽  
pp. 246-254
Author(s):  
Agus Subhan Akbar ◽  
R. Hadapiningradja Kusumodestoni

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.


JURTEKSI ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. 195-202
Author(s):  
Sri Ayu Rizky ◽  
Rolly Yesputra ◽  
Santoso Santoso

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower.            Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers;  Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9.  Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor


2019 ◽  
Vol 1 (1) ◽  
pp. 30-36 ◽  
Author(s):  
Lalu Abd Rahman Hakim ◽  
Ahmad Ashril Rizal ◽  
Dwi Ratnasari

Students are important assets for an educational institution and for this reason, it is necessary to pay attention to the student's graduation rate on time. Presentation of the ups and downs of students' ability to complete their studies on time is one of the elements of campus accreditation assessment. Based on data from the Study Program Section in the last 3 years the student graduation presentation is only 25% of the total students who can complete their studies on time. In this study using the K-Nearest Neighbor algorithm which aims to be able to identify student graduation in new cases by adapting solutions from previous cases that have closeness to new cases. This algorithm has the role to get the value of the closeness of the new case to the old case, which in turn the most population in area K with the closest value obtained by the student is predicted whether to pass on time or not on time. This study uses Roger S. Pressman's waterfalll method, namely Communication, Planning, Modeling, and Construction. Based on the tests carried out using K-Fold Cross Validation, the highest accuracy in the third model was 80% when folded 4th and 61% when the K value = 1. While testing using the Confusion Matrix obtained the highest accuracy of 98% at K = 1 for classification "Timely", and 98% at K = 2 for classification "Not Timely"


2020 ◽  
Vol 5 (1) ◽  
pp. 63-71
Author(s):  
M. Pramadani Riyanis Putra ◽  
Kiki Rizky Nova Wardani

Facebook”is a social networking application where users reveal a lot about the mselves”through their posting pages. So the writer wants”to know what information can be taken about the user's”personality. Data mining plays an important role”which aims to”transform raw data into a structure that can be understood”for further use. ”Text mining refers to the process of”retrieving”high quality information from text”,one of the classification methods that can be used is the K-Nearest Neighbor algorithm. Based on the theory of big five personality”the results of the study concluded that”the accuracy rate obtained was 92.92%, from 550 data with the highest openness personality character value of 239, Conscientiouseness of 16 data, Extraversion of 173 data, Agreeableness of 50 data, Neuroticism of 33 data and 39 data that cannot be classified.


2020 ◽  
Vol 5 (1) ◽  
pp. 77-85
Author(s):  
Heru Pramono Hadi ◽  
Titien S. Sukamto

Feedback masyarakat terhadap pelayanan pemerintah merupakan elemen penting dalam proses evaluasi dan peningkatan kinerja. Maka dari itu pemerintah perlu untuk memiliki metode pelaporan yang efektif, efisien dan sistematis. Feedback masyarakat dapat berupa pengaduan, permintaan informasi dan aspirasi. Salah satu cara penyampain feedback masyarakat adalah melalui media sosial. Klasifikasi jenis laporan/feedback masyarakat ini penting dilakukan untuk mempercepat proses penanggapan laporan. Algoritma K-Nearest neighbor pada metode text mining ini merupakan salah satu solusi untuk dapat membantu proses klasifikasi jenis laporan. Dengan 930 data latih dan 100 data uji laporan masyarakat tahun 2017 yang disampaikan melalui media sosial, menghasilkan nilai akurasi tertinggi k=11 sebesar 82%.


2022 ◽  
Vol 10 (2) ◽  
pp. 217
Author(s):  
I Wayan Santiyasa ◽  
Gede Putra Aditya Brahmantha ◽  
I Wayan Supriana ◽  
I GA Gede Arya Kadyanan ◽  
I Ketut Gede Suhartana ◽  
...  

At this time, information is very easy to obtain, information can spread quickly to all corners of society. However, the information that spreaded are not all true, there is false information or what is commonly called hoax which of course is also easily spread by the public, the public only thinks that all the information circulating on the internet is true. From every news published on the internet, it cannot be known directly that the news is a hoax or valid one. The test uses 740 random contents / issue data that has been verified by an institution, where 370 contents are hoaxes and 370 contents are valid. The test uses the K-Nearest Neighbor algorithm, before the classification process is performed, the preprocessing stage is performed first and uses the TF-IDF equation to get the weight of each feature, then classified using K-Nearest Neighbor and the test results is evaluated using 10-Fold Cross Validation. The test uses the k value with a value of 2 to 10. The optimal use of the k value in the implementation is obtained at a value of k = 4 with precision, recall, and F-Measure results of 0.764856, 0.757583, and 0.751944 respectively and an accuracy of 75.4%


Sign in / Sign up

Export Citation Format

Share Document