Classification Accuracy and Model Selection in k-Nearest Neighbors Classifiers for Data Driven Learning

Author(s):  
Guest Editor Manhui Su
Author(s):  
Ahmed.T. Sahlol ◽  
Aboul Ella Hassanien

There are still many obstacles for achieving high recognition accuracy for Arabic handwritten optical character recognition system, each character has a different shape, as well as the similarities between characters. In this chapter, several feature selection-based bio-inspired optimization algorithms including Bat Algorithm, Grey Wolf Optimization, Whale optimization Algorithm, Particle Swarm Optimization and Genetic Algorithm have been presented and an application of Arabic handwritten characters recognition has been chosen to see their ability and accuracy to recognize Arabic characters. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time.


2019 ◽  
Vol 12 (4) ◽  
pp. 72
Author(s):  
Sara Alomari ◽  
Salha Abdullah

Concept maps have been used to assist learners as an effective learning method in identifying relationships between information, especially when teaching materials have many topics or concepts. However, making a manual concept map is a long and tedious task. It is time-consuming and demands an intensive effort in reading the full content and reasoning the relationships among concepts. Due to this inefficiency, many studies are carried out to develop intelligent algorithms using several data mining techniques. In this research, the authors aim at improving Text Analysis-Association Rules Mining (TA-ARM) algorithm using the weighted K-nearest neighbors (KNN) algorithm instead of the traditional KNN. The weighted KNN is expected to optimize the classification accuracy, which will, eventually, enhance the quality of the generated concept map.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 286 ◽  
Author(s):  
Hamid Saadatfar ◽  
Samiyeh Khosravi ◽  
Javad Hassannataj Joloudari ◽  
Amir Mosavi ◽  
Shahaboddin Shamshirband

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.


Real time crash predictor system is determining frequency of crashes and also severity of crashes. Nowadays machine learning based methods are used to predict the total number of crashes. In this project, prediction accuracy of machine learning algorithms like Decision tree (DT), K-nearest neighbors (KNN), Random forest (RF), Logistic Regression (LR) are evaluated. Performance analysis of these classification methods are evaluated in terms of accuracy. Dataset included for this project is obtained from 49 states of US and 27 states of India which contains 2.25 million US accident crash records and 1.16 million crash records respectively. Results prove that classification accuracy obtained from Random Forest (RF) is96% compared to other classification methods.


2020 ◽  
Vol 8 (5) ◽  
pp. 4900-4904

One of the significant segments of Indian Economy is Cultivation. Occupation to almost 50% of the nation’s labor force is delivered by Indian cultivation segment. India is recognized to be the world's biggest manufacturer of pulses, rice, wheat, spices and spice harvests. Agronomist's financial progress is contingent on the excellence of the goods that they yield, which depend on on the plant's progress and the harvest they get. Consequently, in ground of cultivation, recognition of disease in plants shows an involved part. Plants are exceedingly disposed to to infections that disturb the progress of the plant which in chance distresses the natural balance of the agronomist. In order to distinguish a plant disease at right preliminary period, usage of automatic disease detection procedure is beneficial. The indications of plant diseases are noticeable in various portions of a plant such as leaves, etc. Physical recognition of plant disease by means of leaf descriptions is a wearisome job. The k-mean clustering procedure is utilized for the segmentation of input images. The GLCM (gray-level co-occurrence matrices) procedure is utilized which excerpts textural features from the input image and implementation of KNN (k-nearest neighbors) algorithm for image classification and produced classification accuracy from 70 to 75% for different inputs. Hence, it is required to develop machine learning based computational methods which will make the process of disease detection and classification using leaf images automatic. .. To advance concert of standing methods machine learning and deep learning algorithms will be utilized for more accurate classification.


2020 ◽  
Vol 10 (11) ◽  
pp. 3933 ◽  
Author(s):  
Marcin Blachnik ◽  
Mirosław Kordos

Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training set. In this paper, the performance of instance selection methods is investigated in terms of classification accuracy and reduction of training set size. The classification accuracy of the following classifiers is evaluated: decision trees, random forest, Naive Bayes, linear model, support vector machine and k-nearest neighbors. The obtained results indicate that for the most of the classifiers compressing the training set affects prediction performance and only a small group of instance selection methods can be recommended as a general purpose preprocessing step. These are learning vector quantization based algorithms, along with the Drop2 and Drop3. Other methods are less efficient or provide low compression ratio.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 221669-221688
Author(s):  
Jose Ortiz-Bejar ◽  
Eric S. Tellez ◽  
Mario Graff ◽  
Daniela Moctezuma ◽  
Sabino Miranda-Jimenez

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 93 ◽  
Author(s):  
Tjahjadi ◽  
Ramli

Blood pressure (BP) is an important parameter for the early detection of heart disease because it is associated with symptoms of hypertension or hypotension. A single photoplethysmography (PPG) method for the classification of BP can automatically analyze BP symptoms. Users can immediately know the condition of their BP to ensure early detection. In recent years, deep learning methods have presented outstanding performance in classification applications. However, there are two main problems in deep learning classification methods: classification accuracy and time consumption during training. We attempt to address these limitations and propose a method for the classification of BP using the K-nearest neighbors (KNN) algorithm based on PPG. We collected data for 121 subjects from the PPG–BP figshare database. We divided the subjects into three classification levels, namely normotension, prehypertension, and hypertension, according to the BP levels of the Joint National Committee report. The F1 scores of these three classification trials were 100%, 100%, and 90.80%, respectively. Hence, it is validated that the proposed method can achieve improved classification accuracy without additional manual pre-processing of PPG. Our proposed method achieves higher accuracy than convolutional neural networks (deep learning), bagged tree, logistic regression, and AdaBoost tree.


Sign in / Sign up

Export Citation Format

Share Document