Implementasi Distance Weighted K-Nearest Neighbor Untuk Klasifikasi Spam & Non-Spam Pada Komentar Instagram

Instagram (IG) menjadi salah satu sosial media yang sering dipakai untuk membagikan momen dari para penggunanya. Banyak pula public figure, termasuk artis yang menggunakan sosial media ini sebagai media berbagi mereka. Namun, popularitas dari artis tersebut membuat beberapa kalangan mengirimkan komentar spam, sehingga membuat komentar itu menjadi membingungkan saat dibaca. Tujuan penelitian ini adalah mengimplementasikan dan mengetahui akurasi algoritma DWKNN untuk deteksi komentar spam pada IG. Metode DWKNN digunakan sebagai perbaikan dari metode KNN melalui pelatihan sistem dengan data latih acak. Setelah proses pelatihan, dilakukan pengujian berdasarkan data uji dan latih dengan parameter nilai k dan persentase fitur yang akan digunakan untuk menguji dan membandingkan metode KNN maupun DWKNN berdasarkan hasil klasifikasinya. Kontribusi penelitian ini menunjukkan bahwa akurasi metode DWKNN lebih baik daripada KNN, perbedaan nilai k ini tidak memiliki dampak yang terlalu berarti dalam klasifikasi komentar spam, dan seleksi fitur (Features Selection) memiliki hasil success rate yang baik pada penggunaan FS antara 80% - 100%. Akurasi optimal dari KNN adalah 82.36% sedangkan menggunakan DWKNN mencapai 91.08% pada FS 80%.

Download Full-text

Handwritten Balinesse Character Recognition using K-Nearest Neighbor

10.31227/osf.io/z6m8u ◽

2018 ◽

Author(s):

I Wayan Agus Surya Darma

Keyword(s):

Feature Extraction ◽

Success Rate ◽

Character Recognition ◽

Nearest Neighbor ◽

Recognition System ◽

Extraction Process ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Character Feature

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.

Download Full-text

Stronger Automation for Flyspeck by Feature Weighting and Strategy Evolution

10.29007/5gzr ◽

2018 ◽

Author(s):

Cezary Kaliszyk ◽

Josef Urban

Keyword(s):

Nearest Neighbor ◽

Feature Weighting ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Hol Light ◽

Distance Weighted ◽

Neighbor Classifier

Two complementary AI methods are used to improve the strength of the AI/ATP service for proving conjectures over the HOL Light and Flyspeck corpora. First, several schemes for frequency-based feature weighting are explored in combination with distance-weighted k-nearest-neighbor classifier. This results in 16% improvement (39.0% to 45.5% Flyspeck problems solved) of the overall strength of the service when using 14 CPUs and 30 seconds. The best premise-selection/ATP combination is improved from 24.2% to 31.4%, i.e. by 30%. A smaller improvement is obtained by evolving targetted E prover strategies on two particular premise selections, using the Blind Strategymaker (BliStr) system. This raises the performance of the best AI/ATP method from 31.4% to 34.9%, i.e. by 11%, and raises the current 14-CPU power of the service to 46.9%.

Download Full-text

Map-Reduce based Distance Weighted k-Nearest Neighbor Machine Learning Algorithm for Big Data Applications

10.21203/rs.3.rs-684319/v1 ◽

2021 ◽

Author(s):

Gothai E ◽

Usha Moorthy ◽

Sathishkumar V E ◽

Abeer Ali Alnuaim ◽

Wesam Atef Hatamleh ◽

...

Keyword(s):

Big Data ◽

Nearest Neighbor ◽

Mean Squared Error ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

K Nearest Neighbor ◽

Squared Error ◽

Big Data Applications ◽

Distance Weighted ◽

Classification Tasks

Abstract With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3–4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.

Download Full-text

Effect of information gain on document classification using k-nearest neighbor

10.26594/register.v8i1.2397 ◽

2022 ◽

Vol 8 (1) ◽

pp. 50

Author(s):

Rifki Indra Perwira ◽

Bambang Yuwono ◽

Risya Ines Putri Siswoyo ◽

Febri Liantoni ◽

Hidayatulah Himawan

Keyword(s):

Feature Selection ◽

Test Data ◽

Nearest Neighbor ◽

Intelligent System ◽

Information Gain ◽

Training Data ◽

State Universities ◽

Features Selection ◽

K Nearest Neighbor ◽

Support Students

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.

Download Full-text

Flood Prediction Using Inverse Distance Weighted Interpolation of K-Nearest Neighbor Points

10.1109/igarss47720.2021.9553774 ◽

2021 ◽

Author(s):

Satria Nusa Paradilaga ◽

Margaretha Sulistyoningsih ◽

Rosbintarti Kartika Lestari ◽

Agatha Padma Laksitaningtyas

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

Weighted Interpolation ◽

Flood Prediction ◽

Inverse Distance Weighted ◽

Distance Weighted ◽

Inverse Distance Weighted Interpolation ◽

Inverse Distance

Download Full-text

MMAS Algorithm for Features Selection Using 1D-DWT for Video-Based Face Recognition in the Online Video Contextual Advertisement User-Oriented System

Journal of Global Information Management ◽

10.4018/jgim.2017100107 ◽

2017 ◽

Vol 25 (4) ◽

pp. 103-124 ◽

Cited By ~ 1

Author(s):

Le Nguyen Bao ◽

Dac-Nhuong Le ◽

Gia Nhu Nguyen ◽

Le Van Chung ◽

Nilanjan Dey

Keyword(s):

Feature Selection ◽

Face Recognition ◽

Nearest Neighbor ◽

Discrete Wavelet ◽

Feature Subset ◽

Features Selection ◽

Ant System ◽

K Nearest Neighbor ◽

Video Based Face Recognition ◽

Optimal Feature

Face recognition is an importance step which can affect the performance of the system. In this paper, the authors propose a novel Max-Min Ant System algorithm to optimal feature selection based on Discrete Wavelet Transform feature for Video-based face recognition. The length of the culled feature vector is adopted as heuristic information for ant's pheromone in their algorithm. They selected the optimal feature subset in terms of shortest feature length and the best performance of classifier used k-nearest neighbor classifier. The experiments were analyzed on face recognition show that the authors' algorithm can be easily implemented and without any priori information of features. The evaluated performance of their algorithm is better than previous approaches for feature selection.

Download Full-text

Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2010070104 ◽

2010 ◽

Vol 6 (3) ◽

pp. 56-74 ◽

Cited By ~ 15

Author(s):

Fei-Long Chen ◽

Feng-Chia Li

Keyword(s):

Nearest Neighbor ◽

Credit Scoring ◽

Back Propagation ◽

Support Vector ◽

Sufficient Information ◽

Features Selection ◽

K Nearest Neighbor ◽

Hybrid Features ◽

Wrong Decision ◽

Propagation Network

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.

Download Full-text

A Novel Weighted Distance KNN Algorithm Based on Instances Condensing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.8 ◽

2014 ◽

Vol 701-702 ◽

pp. 8-12 ◽

Cited By ~ 1

Author(s):

Gang Tao ◽

Yong Gang Yan ◽

Jiao Zou ◽

Jun Liu

Keyword(s):

Classification Accuracy ◽

Nearest Neighbor ◽

Training Sample ◽

K Nearest Neighbor ◽

Large Dataset ◽

Weighted Distance ◽

Time Consumption ◽

Distance Weighted ◽

Nonparametric Classification ◽

Sample Set

As a nonparametric classification algorithm, K-Nearest Neighbor (KNN) is very efficient and can be easily realized. However, for large dataset, the computational demands for classifying instances using KNN can be expensive. A way to solve this problem is through the condensing approach. It means we remove instances that will bring computational burden but do not contribute to better classification accuracy. This paper proposes a novel weighted distance KNN algorithm based on instances condensing algorithm. The proposed idea is to extract some representative instances and take the processed result as a new training sample set. Meanwhile, use the distance-weighted WDKNN algorithm to improve the prediction accuracy, our experiments show that the proposed strategy can dramatically shorten the time consumption compared with the traditional KNN. On average, the speedup ratios improve 90% while classification accuracy only has 2% decreases.

Download Full-text

INCREASING ACCURACY OF K-NEAREST NEIGHBOR CLASSIFIER FOR TEXT CLASSIFICATION

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1183 ◽

2014 ◽

pp. 114-119

Author(s):

FALGUNI N. PATEL ◽

NEHA R. SONI

Keyword(s):

Text Classification ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

K Value ◽

Nearest Neighbor Classifier ◽

Distance Weighted ◽

Nearest Neighbor Rule ◽

Sensitivity Problem ◽

Cosine Distance ◽

Neighbor Classifier

k - Nearest Neighbor Rule is a well-known technique for text classification. The reason behind this is its simplicity, effectiveness, easily modifiable. In this paper, we briefly discuss text classification, k-NN algorithm and analyse the sensitivity problem of k value. To overcome this problem, we introduced inverse cosine distance weighted voting function for text classification. Therefore, Accuracy of text classification is increased even if any large value for k is chosen, as compared to simple k Nearest Neighbor classifier. The proposed weighted function is proved as more effective when any application has large text dataset with some dominating categories, using experimental results.

Download Full-text

A Reappraisal of Distance-Weighted K-Nearest Neighbor Classification for Pattern Recognition with Missing Data

IEEE Transactions on Systems Man and Cybernetics ◽

10.1109/tsmc.1981.4308660 ◽

1981 ◽

Vol 11 (3) ◽

pp. 241-243 ◽

Cited By ~ 23

Keyword(s):

Pattern Recognition ◽

Missing Data ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Distance Weighted ◽

Neighbor Classification

Download Full-text