scholarly journals A novel multi-label classification algorithm based on K-nearest neighbor and random walk

2020 ◽  
Vol 16 (3) ◽  
pp. 155014772091189 ◽  
Author(s):  
Zhen-Wu Wang ◽  
Si-Kai Wang ◽  
Ben-Ting Wan ◽  
William Wei Song

The multi-label classification problem occurs in many real-world tasks where an object is naturally associated with multiple labels, that is, concepts. The integration of the random walk approach in the multi-label classification methods attracts many researchers’ sight. One challenge of using the random walk-based multi-label classification algorithms is to construct a random walk graph for the multi-label classification algorithms, which may lead to poor classification quality and high algorithm complexity. In this article, we propose a novel multi-label classification algorithm based on the random walk graph and the K-nearest neighbor algorithm (named MLRWKNN). This method constructs the vertices set of a random walk graph for the K-nearest neighbor training samples of certain test data and the edge set of correlations among labels of the training samples, thus considerably reducing the overhead of time and space. The proposed method improves the similarity measurement by differentiating and integrating the discrete and continuous features, which reflect the relationships between instances more accurately. A label predicted method is devised to reduce the subjectivity of the traditional threshold method. The experimental results with four metrics demonstrate that the proposed method outperforms the seven state-of-the-art multi-label classification algorithms in contrast and makes a significant improvement for multi-label classification.

Author(s):  
Jiahua Jin ◽  
Lu Lu

Hotel social media provides access to dissatisfied customers and their experiences with services. However, due to massive topics and posts in social media, and the sparse distribution of complaint-related posts and, manually identifying complaints is inefficient and time-consuming. In this study, we propose a supervised learning method including training samples enlargement and classifier construction. We first identified reliable complaint and noncomplaint samples from the unlabeled dataset by using small labeled samples as training samples. Combining the labeled samples and enlarged samples, classification algorithms support vector machine and k-nearest neighbor were then adopted to build binary classifiers during the classifier construction process. Experimental results indicate the proposed method can identify complaints from social media efficiently, especially when the amount of labeled training samples is small. This study provides an efficient approach for hotel companies to distinguish a certain kind of consumer complaint information from large number of unrelated information in hotel social media.


2021 ◽  
Vol 15 (4) ◽  
pp. 735-744
Author(s):  
Putri Sri Astuti ◽  
Memi Nor Hayati ◽  
Rito Goejantoro

Classification is the process of grouping objects that have the same characteristics into several categories. This study applies a combination of classification algorithms, namely Bootstrap Aggregating K-Nearest Neighbor in credit scoring analysis. The aim is to classify the credit payment status of electronic goods and furniture at PT KB Finansia Multi Finance in 2020 and determine the level of accuracy produced. Credit payment status is grouped into 2 categories, namely smoothly and not smoothly. There are 7 independent variables that are used to describe the characteristics of the debtor, namely age, number of dependents, length of stay, years of service, income, amount of payment, and payment period. The application of the classification algorithm at the credit scoring analysis is expected to assist creditors in making decisions to accept or reject credit applications from prospective debtors. The results showed that the accuracy obtained from the Bootstrap Aggregating K-Nearest Neighbor algorithm with a proportion of 90:10, m=80%, C=73, and K=5 was the best, which was 92.308%.


2020 ◽  
Vol 10 (2) ◽  
pp. 152-158
Author(s):  
Iswanto ◽  
Yuliana Melita Pranoto ◽  
Reddy Alexandro Harianto

Abstract- Having a sophisticated application, even though often experience problems in deciding BUY - SELL in trading forex trading. This is due to the often time series predictions, in the high variable experiencing high values ​​as well as low variables, for that it is needed a recommendation system to overcome this problem. The application of classification algorithms to the recommendation system in support of BUY-SELL decisions is one appropriate alternative to overcome this. K-Nearest Neighbor (K-NN) algorithm was chosen because the K-NN method is an algorithm that can be used in building a recommendation system that can classify data based on the closest distance. This system is designed to assist traders in making BUY-SELL decisions, based on predictive data. The results of the recommendation system from the ten trials predicted by Arima are recommended. When compared to the price in the field the target profit is 7% per week from ten experiments if the average profit has exceeded the target


2020 ◽  
Vol 19 ◽  

In the paper some fuzzy classification algorithms based upon a nearest neighbor decision rule areconsidered in terms of the pattern recognition algorithms which are based on the computation of estimates (theso-called AEC model). It is shown that the fuzzy K nearest neighbor algorithm can be assigned to the AECclass. In turn, it is found that some standard AEC algorithms, which depend on a number of numericalparameters, can be used as fuzzy classification algorithms. Yet among them there exist algorithms extremalwith respect to these parameters. Such algorithms provide maximum values of the associated performancemeasures.


2013 ◽  
Vol 706-708 ◽  
pp. 1928-1931
Author(s):  
Yu Ma ◽  
Yu Ling Gao ◽  
Shao Yun Song

Traditional k-Nearest Neighbor Algorithm (short for KNN) is usually used in the spatial classification; however, the problem of low-speed searching exists in this method. In order to avoid this kind of disadvantage, this paper puts forward a new spatial classification algorithm of K-nearest neighbor based on spatial predicate. This method searches the object set which is similar to the test object in spatial concept and uses spatial predicate to help search the object set, which narrows the searching range and reduces the operating time of KNN algorithm.


Author(s):  
Chetna Kaushal ◽  
Deepika Koundal

<span>Big data refers to huge set of data which is very common these days due to the increase of internet utilities. Data generated from social media is a very common example for the same. This paper depicts the summary on big data and ways in which it has been utilized in all aspects. Data mining is radically a mode of deriving the indispensable knowledge from extensively vast fractions of data which is quite challenging to be interpreted by conventional methods. The paper mainly focuses on the issues related to the clustering techniques in big data. For the classification purpose of the big data, the existing classification algorithms are concisely acknowledged and after that, k-nearest neighbor algorithm is discreetly chosen among them and described along with an example. </span>


Author(s):  
Veronica Ong ◽  
Derwin Suhartono

The growth in computer vision technology has aided society with various kinds of tasks. One of these tasks is the ability of recognizing text contained in an image, or usually referred to as Optical Character Recognition (OCR). There are many kinds of algorithms that can be implemented into an OCR. The K-Nearest Neighbor is one such algorithm. This research aims to find out the process behind the OCR mechanism by using K-Nearest Neighbor algorithm; one of the most influential machine learning algorithms. It also aims to find out how precise the algorithm is in an OCR program. To do that, a simple OCR program to classify alphabets of capital letters is made to produce and compare real results. The result of this research yielded a maximum of 76.9% accuracy with 200 training samples per alphabet. A set of reasons are also given as to why the program is able to reach said level of accuracy.


The aim of this study is to predict the stress of a person using Machine Learning classifiers. This system classifies the stress of a person as either High or Low. There are various classification algorithms present, out of which 9 classification algorithms have been chosen for this study. The algorithms implemented are K-Nearest Neighbor classifier, Support Vector Machine with an RBF kernel, Decision Tree algorithm, Random Forest algorithm, Bagging Classifier, Adaboost algorithm, Voting classifier, Logistic Regression and MLP classifier. The different algorithms are applied on the same dataset. The dataset is obtained from a GitHub repository labelled Stress classifier with AutoML. The different accuracies of each algorithm are found, and the classification algorithm with the best accuracy is determined. On comparison, it was found that the K-Nearest Neighbor algorithm has the best accuracy with an accuracy rate of 79.3% for physiological stress prediction. While other algorithms had varying accuracies, K-Nearest Neighbor algorithm was the most consistent.


2021 ◽  
Vol 13 (2) ◽  
pp. 76-83
Author(s):  
Ridho Ananda ◽  
Agi Prasetiadi

Classification is one of the data mining topics that will predict an object to go into a certain group. The prediction process can be performed by using similarity measures, classification trees, or regression. On the other hand, Procrustes refers to a technique of matching two configurations that have been implemented for outlier detection. Based on the result, Procrustes has a potential to tackle the misclassification problem when the outliers are assumed as the misclassified object. Therefore, the Procrustes classification algorithm (PrCA) and Procrustes nearest neighbor classification algorithm (PNNCA) were proposed in this paper. The results of those algorithms had been compared to the classical classification algorithms, namely k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), AdaBoost (AB), Random Forest (RF), Logistic Regression (LR), and Ridge Regression (RR). The data used were iris, cancer, liver, seeds, and wine dataset. The minimum and maximum accuracy values obtained by the PrCA algorithm were 0.610 and 0.925, while the PNNCA were 0.610 and 0.963. PrCA was generally better than k-NN, SVM, and AB. Meanwhile, PNNCA was generally better than k-NN, SVM, AB, and RF. Based on the results, PrCA and PNNCA certainly deserve to be proposed as a new approach in the classification process.


Sign in / Sign up

Export Citation Format

Share Document