A NEW C × K-NEAREST NEIGHBOR LINKAGE APPROACH TO THE CLASSIFICATION PROBLEM

Author(s):  
GÖZDE ULUTAGAY ◽  
EFENDI NASIBOV
Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<p>Online business development through e-commerce platforms is a phenomenon which change the world of promoting and selling products in this 21<sup>st</sup> century. Product title classification is an important task in assisting retailers and sellers to list a product in a suitable category. Product title classification is apart of text classification problem but the properties of product title are different from general document. This study aims to evaluate the performance of five different supervised learning models on data sets consist of e-commerce product titles with a very short description and they are incomplete sentences. The supervised learning models involve in the study are Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM) and Random Forest. The results show KNN model is the best model with the highest accuracy and fastest computation time to classify the data used in the study. Hence, KNN model is a good approach in classifying e-commerce products.</p>


Author(s):  
Amparo Baillo ◽  
Antonio Cuevas ◽  
Ricardo Fraiman

This article reviews the literature concerning supervised and unsupervised classification of functional data. It first explains the meaning of unsupervised classification vs. supervised classification before discussing the supervised classification problem in the infinite-dimensional case, showing that its formal statement generally coincides with that of discriminant analysis in the classical multivariate case. It then considers the optimal classifier and plug-in rules, empirical risk and empirical minimization rules, linear discrimination rules, the k nearest neighbor (k-NN) method, and kernel rules. It also describes classification based on partial least squares, classification based on reproducing kernels, and depth-based classification. Finally, it examines unsupervised classification methods, focusing on K-means for functional data, K-means for data in a Hilbert space, and impartial trimmed K-means for functional data. Some practical issues, in particular real-data examples and simulations, are reviewed and some selected proofs are given.


2013 ◽  
Vol 61 (2) ◽  
pp. 179-183
Author(s):  
Anamul H Sajib ◽  
Jafar A Khan

In a classification problem with binary outcome attribute, if the input attributes are both continuous and categorical, the


Symmetry ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 630 ◽  
Author(s):  
Lingjun Zhao ◽  
Chunhua Su ◽  
Huakun Huang ◽  
Zhaoyang Han ◽  
Shuxue Ding ◽  
...  

Device-free localization (DFL) locates targets without being equipped with the attached devices, which is of great significance for intrusion detection or monitoring in the era of the Internet-of-Things (IoT). Aiming at solving the problems of low accuracy and low robustness in DFL approaches, in this paper, we first treat the RSS signal as an RSS-image matrix and conduct a process of eliminating the background to dig out the variation component with distinguished features. Then, we make use of these feature-rich images by formulating DFL as an image classification problem. Furthermore, a deep convolutional neural network (CNN) is designed to extract features automatically for classification. The localization performance of the proposed background elimination-based CNN (BE-CNN) scheme is validated with a real-world dataset of outdoor DFL. In addition, we also validate the robust performance of the proposal by conducting numerical experiments with different levels of noise. Experimental results demonstrate that the proposed scheme has an obvious advantage in terms of improving localization accuracy and robustness for DFL. Particularly, the BE-CNN can maintain the highest localization accuracy of 100%, even in noisy conditions when the SNR is over −5 dB. The BE-based methods can outperform all the corresponding raw data-based methods in terms of the localization accuracy. In addition, the proposed method can outperform the comparison methods, deep neural network with autoencoder, K-nearest-neighbor (KNN), support vector machines (SVM), etc., in terms of the localization accuracy and robustness.


2020 ◽  
Vol 16 (3) ◽  
pp. 155014772091189 ◽  
Author(s):  
Zhen-Wu Wang ◽  
Si-Kai Wang ◽  
Ben-Ting Wan ◽  
William Wei Song

The multi-label classification problem occurs in many real-world tasks where an object is naturally associated with multiple labels, that is, concepts. The integration of the random walk approach in the multi-label classification methods attracts many researchers’ sight. One challenge of using the random walk-based multi-label classification algorithms is to construct a random walk graph for the multi-label classification algorithms, which may lead to poor classification quality and high algorithm complexity. In this article, we propose a novel multi-label classification algorithm based on the random walk graph and the K-nearest neighbor algorithm (named MLRWKNN). This method constructs the vertices set of a random walk graph for the K-nearest neighbor training samples of certain test data and the edge set of correlations among labels of the training samples, thus considerably reducing the overhead of time and space. The proposed method improves the similarity measurement by differentiating and integrating the discrete and continuous features, which reflect the relationships between instances more accurately. A label predicted method is devised to reduce the subjectivity of the traditional threshold method. The experimental results with four metrics demonstrate that the proposed method outperforms the seven state-of-the-art multi-label classification algorithms in contrast and makes a significant improvement for multi-label classification.


2019 ◽  
Vol 29 (1) ◽  
pp. 1453-1467 ◽  
Author(s):  
Ritam Guha ◽  
Manosij Ghosh ◽  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

Abstract The feature selection process is very important in the field of pattern recognition, which selects the informative features so as to reduce the curse of dimensionality, thus improving the overall classification accuracy. In this paper, a new feature selection approach named Memory-Based Histogram-Oriented Multi-objective Genetic Algorithm (M-HMOGA) is introduced to identify the informative feature subset to be used for a pattern classification problem. The proposed M-HMOGA approach is applied to two recently used feature sets, namely Mojette transform and Regional Weighted Run Length features. The experimentations are carried out on Bangla, Devanagari, and Roman numeral datasets, which are the three most popular scripts used in the Indian subcontinent. In-house Bangla and Devanagari script datasets and Competition on Handwritten Digit Recognition (HDRC) 2013 Roman numeral dataset are used for evaluating our model. Moreover, as proof of robustness, we have applied an innovative approach of using different datasets for training and testing. We have used in-house Bangla and Devanagari script datasets for training the model, and the trained model is then tested on Indian Statistical Institute numeral datasets. For Roman numerals, we have used the HDRC 2013 dataset for training and the Modified National Institute of Standards and Technology dataset for testing. Comparison of the results obtained by the proposed model with existing HMOGA and MOGA techniques clearly indicates the superiority of M-HMOGA over both of its ancestors. Moreover, use of K-nearest neighbor as well as multi-layer perceptron as classifiers speaks for the classifier-independent nature of M-HMOGA. The proposed M-HMOGA model uses only about 45–50% of the total feature set in order to achieve around 1% increase when the same datasets are partitioned for training-testing and a 2–3% increase in the classification ability while using only 35–45% features when different datasets are used for training-testing with respect to the situation when all the features are used for classification.


2013 ◽  
Vol 61 (1) ◽  
pp. 81-85
Author(s):  
AH Sajib ◽  
AZM Shafiullah ◽  
AH Sumon

This study considers the classification problem for binary output attribute when input attributes are drawn from multivariate normal distribution, in both clean and contaminated case. Classical metrics are affected by the outliers, while robust metrics are computationally inefficient. In order to achieve robustness and computational efficiency at the same time, we propose a new robust distance metric for K-Nearest Neighbor (KNN) method. We call our proposed metric Alternative Robust Mahalanobis Distance (ARMD) metric. Thus KNN using ARMD is alternative KNN method. The classical metrics use non robust estimate (mean) as the building block. To construct the proposed ARMD metric, we replace non robust estimate (mean) by its robust counterpart median. Thus, we developed ARMD metric for alternative KNN classification technique. Our simulation studies show that the proposed alternative KNN method gives better results in case of contaminated data compared to the classical KNN. The performance of our method is similar to classical KNN using the existing robust metric. The major advantage of proposed method is that it requires less computing time compared to classical KNN that using existing robust metric. Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) DOI: http://dx.doi.org/10.3329/dujs.v61i1.15101


2013 ◽  
Vol 718-720 ◽  
pp. 293-298 ◽  
Author(s):  
Pu Wang ◽  
Xuan Xiao

t has special meaning for drug design as well as basic research to study Antimicrobial peptides (AMPs) because they have been demonstrated to kill Gram negative and Gram positive bacteria, mycobacteria, enveloped viruses, fungi and even transformed or cancerous cells. In view of this, it is highly desired to develop an effective computational method for accurately predicting the functional types of AMPs because it can provide us with more candidates and useful insights for drug design. AMP functional recognition is in fact a multi-label classification problem. In this study, up to six kinds of physicochemical properties value are selected to code the AMP sequence as physical-chemical property matrix (PCM), and then auto and cross covariance transformation is performed to extract features from the PCM for AMP sequence expression; At last, a clever use of Fuzzy K nearest neighbor rule will help identify the multiple functions of a query AMP. As a result, the overall classification accuracy about 65% has been achieved through the rigorous Jackknife test on a newly constructed benchmark AMP dataset.


2019 ◽  
Vol 5 (2) ◽  
pp. 83-89
Author(s):  
Muhammad Athoillah

Handwritten text recognition is the ability of a system to recognize human handwritten and convert it into digital text. Handwritten text recognition is a form of classification problem, so a classification algorithm such as Nearest Neighbor (NN) is needed to solve it. NN algorithms is a simple algorithm yet provide a good result. In contrast with other algorithms that usually determined by some hypothesis class, NN Algorithm finds out a label on any test point without searching for a predictor within some predefined class of functions. Arabic is one of the most important languages in the world. Recognizing Arabic character is very interesting research, not only it is a primary language that used in Islam but also because the number of this research is still far behind the number of recognizing handwritten Latin or Chinese research. Due to that's the background, this framework built a system to recognize handwritten Arabic Character from an image dataset using the NN algorithm. The result showed that the proposed method could recognize the characters very well confirmed by its average of precision, recall and accuracy.


Sign in / Sign up

Export Citation Format

Share Document