K Nearest Neighbor Edition to Guide Classification Tree Learning: Motivation and Experimental Results

Author(s):  
J. M. Martínez-Otzeta ◽  
B. Sierra ◽  
E. Lazkano ◽  
A. Astigarraga

2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.



2015 ◽  
Vol 13 (2) ◽  
pp. 50-58
Author(s):  
R. Khadim ◽  
R. El Ayachi ◽  
Mohamed Fakir

This paper focuses on the recognition of 3D objects using 2D attributes. In order to increase the recognition rate, the present an hybridization of three approaches to calculate the attributes of color image, this hybridization based on the combination of Zernike moments, Gist descriptors and color descriptor (statistical moments). In the classification phase, three methods are adopted: Neural Network (NN), Support Vector Machine (SVM), and k-nearest neighbor (KNN). The database COIL-100 is used in the experimental results.



Author(s):  
Khairul Anam ◽  
Adel Al-Jumaily

Myoelectric pattern recognition (MPR) is used to detect user’s intention to achieve a smooth interaction between human and machine. The performance of MPR is influenced by the features extracted and the classifier employed. A kernel extreme learning machine especially radial basis function extreme learning machine (RBF-ELM) has emerged as one of the potential classifiers for MPR. However, RBF-ELM should be optimized to work efficiently. This paper proposed an optimization of RBF-ELM parameters using hybridization of particle swarm optimization (PSO) and a wavelet function. These proposed systems are employed to classify finger movements on the amputees and able-bodied subjects using electromyography signals. The experimental results show that the accuracy of the optimized RBF-ELM is 95.71% and 94.27% in the healthy subjects and the amputees, respectively. Meanwhile, the optimization using PSO only attained the average accuracy of 95.53 %, and 92.55 %, on the healthy subjects and the amputees, respectively. The experimental results also show that SW-RBF-ELM achieved the accuracy that is better than other well-known classifiers such as support vector machine (SVM), linear discriminant analysis (LDA) and k-nearest neighbor (kNN).



Author(s):  
Reza Safdari ◽  
Peyman Rezaei-Hachesu ◽  
Marjan GhaziSaeedi ◽  
Taha Samad-Soltani ◽  
Maryam Zolnoori

Medical data mining intends to solve real-world problems in the diagnosis and treatment of diseases. This process applies various techniques and algorithms which have different levels of accuracy and precision. The purpose of this article is to apply data mining techniques to the diagnosis of asthma. Sensitivity, specificity and accuracy of K-nearest neighbor, Support Vector Machine, naive Bayes, Artificial Neural Network, classification tree, CN2 algorithms, and related similar studies were evaluated. ROC curves were plotted to show the performance of the authors' approach. Support vector machine (SVM) algorithms achieved the highest accuracy at 98.59% with a sensitivity of 98.59% and a specificity of 98.61% for class 1. Other algorithms had a range of accuracy greater than 87%. The results show that the authors can accurately diagnose asthma approximately 98% of the time based on demographics and clinical data. The study also has a higher sensitivity when compared to expert and knowledge-based systems.



2007 ◽  
Vol 16 (04) ◽  
pp. 627-646 ◽  
Author(s):  
YAN ZHOU ◽  
MADHURI S. MULEKAR ◽  
PRAVEEN NERELLAPALLI

Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of the e-mail collected over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed class distributions. We compare our technique with several existing off-line learning techniques including support vector machine, logistic regression, naïve Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking. We demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available. A more in-depth statistical analysis on the experimental results is also presented and discussed.



Location Based Services anticipate that customers should continually report their zone to a possibly untrusted server to get organizations reliant on their territory, which can open them to security risks. Tragically, existing security defending strategies for LBS have a couple of requirements, for instance, requiring a totally trusted in outcast, offering confined insurance confirmations and realizing high correspondence overhead. In this errand, A customer portrayed security organize structure called dynamic grid system (DGS) is proposed; the important sweeping structure that fulfills four central requirements for insurance shielding delineation and industrious LBS. The structure just requires a semi-trusted in outcast, at risk for doing clear planning undertakings viably. This semi-accepted pariah doesn't have any information about a customer's region. Secure delineation and steady zone assurance is guaranteed under the described enemy models. The correspondence cost for the customer doesn't depend upon the customer's optimal insurance level, it just depends upon the amount of huge central focuses in the area of the customer. Notwithstanding the way that the consideration is on range and k-nearest neighbor requests in this work, This structure can be easily connected with assistance other spatial inquiries without changing the estimations run by the semi-trusted in outcast and the database server, gave the essential interest an area of a spatial request can be engrossed into spatial regions. Experimental results show that the DGS is more productive than the best in class protection saving strategy for LBS.



2020 ◽  
Vol 39 (4) ◽  
pp. 5359-5368
Author(s):  
B Ratna Raju ◽  
G.N Swamy ◽  
K. Padma Raju

The Colorectal cancer leads to more number of death in recent years. The diagnosis of Colorectal cancer as early is safe to treat the patient. To identify and treat this type of cancer, Colonoscopy is applied commonly. The feature selection based methods are proposed which helps to choose the subset variables and to attain better prediction. An Imperialist Competitive Algorithm (ICA) is proposed which helps to select features in identification of colon cancer and its treatment. Also K-Nearest Neighbor (KNN) classifier is used to retain a minimal Euclidean distance between the feature of query vector and all the data in the nature of prototype training. Experimental results have proved that the proposed method is superior when compared to other methods in its metrics of performance. Better accuracy is achieved by the proposed method.



2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Yuxing Li ◽  
Shangbin Jiao ◽  
Bo Geng ◽  
Xinru Jiang

Dispersion entropy (DE), as a newly proposed entropy, has achieved remarkable results in its application. In this paper, on the basis of DE, combined with coarse-grained processing, we introduce the fluctuation and distance information of signal and propose the refined composite multiscale fluctuation-based reverse dispersion entropy (RCMFRDE). As an emerging complexity analysis mode, RCMFRDE has been used for the first time for the feature extraction of ship-radiated noise signals to mitigate the loss caused by the misclassification of ships on the ocean. Meanwhile, a classification and recognition method combined with K-nearest neighbor (KNN) came into being, namely, RCMFRDE-KNN. The experimental results indicated that RCMFRDE has the highest recognition rate in the single feature case and up to 100% in the double feature case, far better than multiscale DE (MDE), multiscale fluctuation-based DE (MFDE), multiscale permutation entropy (MPE), and multiscale reverse dispersion entropy (MRDE), and all the experimental results show that the RCMFRDE proposed in this paper improves the separability of the commonly used entropy in the hydroacoustic domain.



2021 ◽  
Vol 38 (1) ◽  
pp. 51-60
Author(s):  
Semih Ergin ◽  
Sahin Isik ◽  
Mehmet Bilginer Gulmezoglu

In this paper, the implementations and comparison of some classifiers along with 2D subspace projection approaches have been carried out for the face recognition problem. For this purpose, the well-known classifiers such as K-Nearest Neighbor (K-NN), Common Matrix Approach (CMA), Support Vector Machine (SVM) and Convolutional Neural Network (CNN) are conducted on low dimensional face representations that are determined from 2DPCA-, 2DSVD- and 2DFDA approaches. CMA, which is a 2D version of the Common Vector Approach (CVA), finds a common matrix for each face class. From the experimental results, we have observed that the SVM presents a dominant performance in general. When overall results of all datasets are considered, CMA is slightly superior to others in case of 2DPCA- and 2DSVD-based features matrices of the AR dataset. On the other side, CNN is better than other classifiers when it comes to develop a face recognition system based on original face samples and 2DPCA-based feature matrices of the Yale dataset. The experimental results indicate that use of these feature matrices with CMA, SVM, and CNN in classification problems is more advantageous than the use of original pixel matrices in the sense of both processing time and memory requirement.



2013 ◽  
Vol 22 (02) ◽  
pp. 1250083
Author(s):  
PEJMAN MOWLAEE ◽  
ABOLGHASEM SAYADIYAN

A preprocessing stage in every speech/music applications including audio/speech separation, speech/speaker recognition and audio/genre transcription task is inevitable. The importance of such pre-processing stage is originated from the requisite of determining each frame of the given signal is belonged to which classes, namely: speech only, music only or speech/music mixture. Such classification can significantly decrease the computational burden due to exhaustive search commonly introduced as a problem in model-based speech recognition or separation as well as music transcription scenarios. In this paper, we present a new method to separate mixed type audio frames based on support vector machine (SVM) and neural network. We present a feature type selection algorithm which seeks for the most appropriate features to discriminate possible classes (hypotheses) on the mixed signal. We also propose features based on eigen-decomposition on the mixed frame. Experimental results demonstrate that the proposed features together with the selected audio classifiers achieve acceptable classification results. From the experimental results, it is observed that the proposed system outperforms other classification systems including k-nearest neighbor (k-NN) and multi-layer perceptron (MLP).



Sign in / Sign up

Export Citation Format

Share Document