A weighting approach for KNN classifier

Author(s):  
Halil Yigit
Keyword(s):  
2020 ◽  
Vol 10 (10) ◽  
pp. 3356 ◽  
Author(s):  
Jose J. Valero-Mas ◽  
Francisco J. Castellanos

Within the Pattern Recognition field, two representations are generally considered for encoding the data: statistical codifications, which describe elements as feature vectors, and structural representations, which encode elements as high-level symbolic data structures such as strings, trees or graphs. While the vast majority of classifiers are capable of addressing statistical spaces, only some particular methods are suitable for structural representations. The kNN classifier constitutes one of the scarce examples of algorithms capable of tackling both statistical and structural spaces. This method is based on the computation of the dissimilarity between all the samples of the set, which is the main reason for its high versatility, but in turn, for its low efficiency as well. Prototype Generation is one of the possibilities for palliating this issue. These mechanisms generate a reduced version of the initial dataset by performing data transformation and aggregation processes on the initial collection. Nevertheless, these generation processes are quite dependent on the data representation considered, being not generally well defined for structural data. In this work we present the adaptation of the generation-based reduction algorithm Reduction through Homogeneous Clusters to the case of string data. This algorithm performs the reduction by partitioning the space into class-homogeneous clusters for then generating a representative prototype as the median value of each group. Thus, the main issue to tackle is the retrieval of the median element of a set of strings. Our comprehensive experimentation comparatively assesses the performance of this algorithm in both the statistical and the string-based spaces. Results prove the relevance of our approach by showing a competitive compromise between classification rate and data reduction.


2017 ◽  
Vol 10 (3) ◽  
pp. 310-331 ◽  
Author(s):  
Sudeep Thepade ◽  
Rik Das ◽  
Saurav Ghosh

Purpose Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image databases has been facing increased complexities for designing an efficient feature extraction process. Conventional approaches of image classification with text-based image annotation have faced assorted limitations due to erroneous interpretation of vocabulary and huge time consumption involved due to manual annotation. Content-based image recognition has emerged as an alternative to combat the aforesaid limitations. However, exploring rich feature content in an image with a single technique has lesser probability of extract meaningful signatures compared to multi-technique feature extraction. Therefore, the purpose of this paper is to explore the possibilities of enhanced content-based image recognition by fusion of classification decision obtained using diverse feature extraction techniques. Design/methodology/approach Three novel techniques of feature extraction have been introduced in this paper and have been tested with four different classifiers individually. The four classifiers used for performance testing were K nearest neighbor (KNN) classifier, RIDOR classifier, artificial neural network classifier and support vector machine classifier. Thereafter, classification decisions obtained using KNN classifier for different feature extraction techniques have been integrated by Z-score normalization and feature scaling to create fusion-based framework of image recognition. It has been followed by the introduction of a fusion-based retrieval model to validate the retrieval performance with classified query. Earlier works on content-based image identification have adopted fusion-based approach. However, to the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work. Findings The proposed fusion techniques have successfully outclassed the state-of-the-art techniques in classification and retrieval performances. Four public data sets, namely, Wang data set, Oliva and Torralba (OT-scene) data set, Corel data set and Caltech data set comprising of 22,615 images on the whole are used for the evaluation purpose. Originality/value To the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work. The novel idea of exploring rich image features by fusion of multiple feature extraction techniques has also encouraged further research on dimensionality reduction of feature vectors for enhanced classification results.


2021 ◽  
Vol 15 (2) ◽  
pp. 131-144
Author(s):  
Redha Taguelmimt ◽  
Rachid Beghdad

On one hand, there are many proposed intrusion detection systems (IDSs) in the literature. On the other hand, many studies try to deduce the important features that can best detect attacks. This paper presents a new and an easy-to-implement approach to intrusion detection, named distance sum-based k-nearest neighbors (DS-kNN), which is an improved version of k-NN classifier. Given a data sample to classify, DS-kNN computes the distance sum of the k-nearest neighbors of the data sample in each of the possible classes of the dataset. Then, the data sample is assigned to the class having the smallest sum. The experimental results show that the DS-kNN classifier performs better than the original k-NN algorithm in terms of accuracy, detection rate, false positive, and attacks classification. The authors mainly compare DS-kNN to CANN, but also to SVM, S-NDAE, and DBN. The obtained results also show that the approach is very competitive.


Author(s):  
Shivali Parkhedkar ◽  
Shaveri Vairagade ◽  
Vishakha Sakharkar ◽  
Bharti Khurpe ◽  
Arpita Pikalmunde ◽  
...  

In our proposed work we will accept the challenges of recognizing the words and we will work to win the challenge. The handwritten document is scanned using a scanner. The image of the scanned document is processed victimization the program. Each character in the word is isolated. Then the individual isolated character is subjected to “Feature Extraction” by the Gabor Feature. Extracted features are passed through KNN classifier. Finally we get the Recognized word. Character recognition is a process by which computer recognizes handwritten characters and turns them into a format which a user can understand. Computer primarily based pattern recognition may be a method that involves many sub process. In today’s surroundings character recognition has gained ton of concentration with in the field of pattern recognition. Handwritten character recognition is beneficial in cheque process in banks, form processing systems and many more. Character recognition is one in all the favored and difficult space in analysis. In future, character recognition creates paperless environment. The novelty of this approach is to achieve better accuracy, reduced computational time for recognition of handwritten characters. The proposed method extracts the geometric features of the character contour. These features are based on the basic line types that forms the character skeleton. The system offers a feature vector as its output. The feature vectors so generated from a training set, were then used to train a pattern recognition engine based on Neural Networks so that the system can be benchmarked. The algorithm proposed concentrates on the same. It extracts totally different line varieties that forms a specific character. It conjointly also concentrates on the point options of constant. The feature extraction technique explained was tested using a Neural Network which was trained with the feature vectors obtained from the proposed method.


2022 ◽  
Vol 12 (1) ◽  
pp. 55
Author(s):  
Fatih Demir ◽  
Kamran Siddique ◽  
Mohammed Alswaitti ◽  
Kursat Demir ◽  
Abdulkadir Sengur

Parkinson’s disease (PD), which is a slowly progressing neurodegenerative disorder, negatively affects people’s daily lives. Early diagnosis is of great importance to minimize the effects of PD. One of the most important symptoms in the early diagnosis of PD disease is the monotony and distortion of speech. Artificial intelligence-based approaches can help specialists and physicians to automatically detect these disorders. In this study, a new and powerful approach based on multi-level feature selection was proposed to detect PD from features containing voice recordings of already-diagnosed cases. At the first level, feature selection was performed with the Chi-square and L1-Norm SVM algorithms (CLS). Then, the features that were extracted from these algorithms were combined to increase the representation power of the samples. At the last level, those samples that were highly distinctive from the combined feature set were selected with feature importance weights using the ReliefF algorithm. In the classification stage, popular classifiers such as KNN, SVM, and DT were used for machine learning, and the best performance was achieved with the KNN classifier. Moreover, the hyperparameters of the KNN classifier were selected with the Bayesian optimization algorithm, and the performance of the proposed approach was further improved. The proposed approach was evaluated using a 10-fold cross-validation technique on a dataset containing PD and normal classes, and a classification accuracy of 95.4% was achieved.


Author(s):  
Amal A. Moustafa ◽  
Ahmed Elnakib ◽  
Nihal F. F. Areed

This paper presents a methodology for Age-Invariant Face Recognition (AIFR), based on the optimization of deep learning features. The proposed method extracts deep learning features using transfer deep learning, extracted from the unprocessed face images. To optimize the extracted features, a Genetic Algorithm (GA) procedure is designed in order to select the most relevant features to the problem of identifying a person based on his/her facial images over different ages. For classification, K-Nearest Neighbor (KNN) classifiers with different distance metrics are investigated, i.e., Correlation, Euclidian, Cosine, and Manhattan distance metrics. Experimental results using a Manhattan distance KNN classifier achieves the best Rank-1 recognition rate of 86.2% and 96% on the standard FGNET and MORPH datasets, respectively. Compared to the state-of-the-art methods, our proposed method needs no preprocessing stages. In addition, the experiments show its privilege over other related methods.


Sign in / Sign up

Export Citation Format

Share Document