scholarly journals Implementasi Distance Weighted K-Nearest Neighbor Untuk Klasifikasi Spam & Non-Spam Pada Komentar Instagram

2020 ◽  
Vol 6 (2) ◽  
pp. 236
Author(s):  
Antonius Rachmat Chrismanto ◽  
Yuan Lukito ◽  
Anton Susilo

Instagram (IG) menjadi salah satu sosial media yang sering dipakai untuk membagikan momen dari para penggunanya. Banyak pula public figure, termasuk artis yang menggunakan sosial media ini sebagai media berbagi mereka. Namun, popularitas dari artis tersebut membuat beberapa kalangan mengirimkan komentar spam, sehingga membuat komentar itu menjadi membingungkan saat dibaca. Tujuan penelitian ini adalah mengimplementasikan dan mengetahui akurasi algoritma DWKNN untuk deteksi komentar spam pada IG. Metode DWKNN digunakan sebagai perbaikan dari metode KNN melalui pelatihan sistem dengan data latih acak. Setelah proses pelatihan, dilakukan pengujian berdasarkan data uji dan latih dengan parameter nilai k dan persentase fitur yang akan digunakan untuk menguji dan membandingkan metode KNN maupun DWKNN berdasarkan hasil klasifikasinya. Kontribusi penelitian ini menunjukkan bahwa akurasi metode DWKNN lebih baik daripada KNN, perbedaan nilai k ini tidak memiliki dampak yang terlalu berarti dalam klasifikasi komentar spam, dan seleksi fitur (Features Selection) memiliki hasil success rate yang baik pada penggunaan FS antara 80% - 100%. Akurasi optimal dari KNN adalah 82.36% sedangkan menggunakan DWKNN mencapai 91.08% pada FS 80%.

2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


10.29007/5gzr ◽  
2018 ◽  
Author(s):  
Cezary Kaliszyk ◽  
Josef Urban

Two complementary AI methods are used to improve the strength of the AI/ATP service for proving conjectures over the HOL Light and Flyspeck corpora. First, several schemes for frequency-based feature weighting are explored in combination with distance-weighted k-nearest-neighbor classifier. This results in 16% improvement (39.0% to 45.5% Flyspeck problems solved) of the overall strength of the service when using 14 CPUs and 30 seconds. The best premise-selection/ATP combination is improved from 24.2% to 31.4%, i.e. by 30%. A smaller improvement is obtained by evolving targetted E prover strategies on two particular premise selections, using the Blind Strategymaker (BliStr) system. This raises the performance of the best AI/ATP method from 31.4% to 34.9%, i.e. by 11%, and raises the current 14-CPU power of the service to 46.9%.


2021 ◽  
Author(s):  
Gothai E ◽  
Usha Moorthy ◽  
Sathishkumar V E ◽  
Abeer Ali Alnuaim ◽  
Wesam Atef Hatamleh ◽  
...  

Abstract With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3–4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.


2022 ◽  
Vol 8 (1) ◽  
pp. 50
Author(s):  
Rifki Indra Perwira ◽  
Bambang Yuwono ◽  
Risya Ines Putri Siswoyo ◽  
Febri Liantoni ◽  
Hidayatulah Himawan

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.


2017 ◽  
Vol 25 (4) ◽  
pp. 103-124 ◽  
Author(s):  
Le Nguyen Bao ◽  
Dac-Nhuong Le ◽  
Gia Nhu Nguyen ◽  
Le Van Chung ◽  
Nilanjan Dey

Face recognition is an importance step which can affect the performance of the system. In this paper, the authors propose a novel Max-Min Ant System algorithm to optimal feature selection based on Discrete Wavelet Transform feature for Video-based face recognition. The length of the culled feature vector is adopted as heuristic information for ant's pheromone in their algorithm. They selected the optimal feature subset in terms of shortest feature length and the best performance of classifier used k-nearest neighbor classifier. The experiments were analyzed on face recognition show that the authors' algorithm can be easily implemented and without any priori information of features. The evaluated performance of their algorithm is better than previous approaches for feature selection.


Author(s):  
Fei-Long Chen ◽  
Feng-Chia Li

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.


2014 ◽  
Vol 701-702 ◽  
pp. 8-12 ◽  
Author(s):  
Gang Tao ◽  
Yong Gang Yan ◽  
Jiao Zou ◽  
Jun Liu

As a nonparametric classification algorithm, K-Nearest Neighbor (KNN) is very efficient and can be easily realized. However, for large dataset, the computational demands for classifying instances using KNN can be expensive. A way to solve this problem is through the condensing approach. It means we remove instances that will bring computational burden but do not contribute to better classification accuracy. This paper proposes a novel weighted distance KNN algorithm based on instances condensing algorithm. The proposed idea is to extract some representative instances and take the processed result as a new training sample set. Meanwhile, use the distance-weighted WDKNN algorithm to improve the prediction accuracy, our experiments show that the proposed strategy can dramatically shorten the time consumption compared with the traditional KNN. On average, the speedup ratios improve 90% while classification accuracy only has 2% decreases.


Author(s):  
FALGUNI N. PATEL ◽  
NEHA R. SONI

k - Nearest Neighbor Rule is a well-known technique for text classification. The reason behind this is its simplicity, effectiveness, easily modifiable. In this paper, we briefly discuss text classification, k-NN algorithm and analyse the sensitivity problem of k value. To overcome this problem, we introduced inverse cosine distance weighted voting function for text classification. Therefore, Accuracy of text classification is increased even if any large value for k is chosen, as compared to simple k Nearest Neighbor classifier. The proposed weighted function is proved as more effective when any application has large text dataset with some dominating categories, using experimental results.


Sign in / Sign up

Export Citation Format

Share Document