HYPERSPHERICAL PROTOTYPES FOR PATTERN CLASSIFICATION

The nearest neighbor method is one of the most widely used pattern classification methods. However its major drawback in practice is the curse of dimensionality. In this paper, we propose a new method to alleviate this problem significantly. In this method, we attempt to cover the training patterns of each class with a number of hyperspheres. The method attempts to design hyperspheres as compact as possible, and we pose this as a quadratic optimization problem. We performed several simulation experiments, and found that the proposed approach results in considerable speed-up over the k-nearest-neighbor method while maintaining the same level of accuray. It also significantly beats other prototype classification methods (Like LVQ, RCE and CCCD) in most performance aspects.

Download Full-text

Clustering of cancer data based on Stiefel manifold for multiple views

BMC Bioinformatics ◽

10.1186/s12859-021-04195-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jing Tian ◽

Jianping Zhao ◽

Chunhou Zheng

Keyword(s):

Optimization Problem ◽

Nearest Neighbor ◽

Search Algorithm ◽

Stiefel Manifold ◽

Omics Data ◽

K Nearest Neighbor ◽

Cancer Data ◽

Clustering Problem ◽

Multiple Datasets ◽

Cluster Class

Abstract Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.

Download Full-text

A Speed-up K-Nearest Neighbor Classification Algorithm for Trojan Detection

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-19086-6_24 ◽

2019 ◽

pp. 214-224

Author(s):

Tianshuang Li ◽

Xiang Ji ◽

Jingmei Li

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Trojan Detection ◽

Speed Up ◽

Neighbor Classification

Download Full-text

Machine Learning–based Analysis of English Lateral Allophones

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0029 ◽

2019 ◽

Vol 29 (2) ◽

pp. 393-405 ◽

Cited By ~ 1

Author(s):

Magdalena Piotrowska ◽

Gražina Korvel ◽

Bożena Kostek ◽

Tomasz Ciszewski ◽

Andrzej Cżyzewski

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Nearest Neighbor ◽

Native Speakers ◽

Automatic Evaluation ◽

Classification Methods ◽

K Nearest Neighbor ◽

Self Organizing Maps ◽

Automatic Methods ◽

Audio Video

Abstract Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and self-organizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.

Download Full-text

Comparison of American Sign Language Use Identification using Multi-Class SVM Classification, Backpropagation Neural Network, K - Nearest Neighbor and Naive Bayes

Teknik ◽

10.14710/teknik.v42i2.36929 ◽

2021 ◽

Vol 42 (2) ◽

pp. 137-148

Author(s):

Vincentius Abdi Gunawan ◽

Leonardus Sandy Ade Putra

Keyword(s):

Neural Network ◽

American Sign Language ◽

Sign Language ◽

Nearest Neighbor ◽

American Sign ◽

Classification Methods ◽

K Nearest Neighbor ◽

Backpropagation Neural Network ◽

Svm Classification ◽

Verbal Language

Communication is essential in conveying information from one individual to another. However, not all individuals in the world can communicate verbally. According to WHO, deafness is a hearing loss that affects 466 million people globally, and 34 million are children. So it is necessary to have a non-verbal language learning method for someone who has hearing problems. The purpose of this study is to build a system that can identify non-verbal language so that it can be easily understood in real-time. A high success rate in the system needs a proper method to be applied in the system, such as machine learning supported by wavelet feature extraction and different classification methods in image processing. Machine learning was applied in the system because of its ability to recognize and compare the classification results in four different methods. The four classifications used to compare the hand gesture recognition from American Sign Language are the Multi-Class SVM classification, Backpropagation Neural Network Backpropagation, K - Nearest Neighbor (K-NN), and Naïve Bayes. The simulation test of the four classification methods that have been carried out obtained success rates of 99.3%, 98.28%, 97.7%, and 95.98%, respectively. So it can be concluded that the classification method using the Multi-Class SVM has the highest success rate in the introduction of American Sign Language, which reaches 99.3%. The whole system is designed and tested using MATLAB as supporting software and data processing.

Download Full-text

A new edited k-nearest neighbor rule in the pattern classification problem

Pattern Recognition ◽

10.1016/s0031-3203(99)00068-0 ◽

2000 ◽

Vol 33 (3) ◽

pp. 521-528 ◽

Cited By ~ 45

Author(s):

Kazuo Hattori ◽

Masahito Takahashi

Keyword(s):

Pattern Classification ◽

Nearest Neighbor ◽

Classification Problem ◽

K Nearest Neighbor ◽

Nearest Neighbor Rule

Download Full-text

Writer Identification Based on Arabic Handwriting Recognition by using Speed Up Robust Feature and K- Nearest Neighbor Classification

Journal of University of Babylon for Pure and Applied Sciences ◽

10.29196/jubpas.v27i1.2060 ◽

2019 ◽

Vol 27 (1) ◽

pp. 1-10

Author(s):

Alia Karim Abdul Hassan ◽

Bashar Saadoon Mahdi ◽

Asmaa Abdullah Mohammed

Keyword(s):

Feature Extraction ◽

Nearest Neighbor ◽

Recognition Rate ◽

Image Feature ◽

Writer Identification ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Speed Up ◽

Speed Up Robust Feature ◽

Neighbor Classification

In a writer recognition system, the system performs a “one-to-many” search in a large database with handwriting samples of known authors and returns a possible candidate list. This paper proposes method for writer identification handwritten Arabic word without segmentation to sub letters based on feature extraction speed up robust feature transform (SURF) and K nearest neighbor classification (KNN) to enhance the writer's identification accuracy. After feature extraction, it can be cluster by K-means algorithm to standardize the number of features. The feature extraction and feature clustering called to gather Bag of Word (BOW); it converts arbitrary number of image feature to uniform length feature vector. The proposed method experimented using (IFN/ENIT) database. The recognition rate of experiment result is (96.666).

Download Full-text

COMPARING NAIVE BAYES, K-NEAREST NEIGHBOR, AND NEURAL NETWORK CLASSIFICATION METHODS OF SEAT LOAD FACTOR IN LOMBOK OUTBOUND FLIGHTS

Jurnal Matematika Statistika dan Komputasi ◽

10.20956/jmsk.v16i2.7864 ◽

2019 ◽

Vol 16 (2) ◽

pp. 187

Author(s):

Mega Luna Suliztia ◽

Achmad Fauzan

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Load Factor ◽

Classification Methods ◽

Affecting Factors ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Ticket Prices

Classification is the process of grouping data based on observed variables to predict new data whose class is unknown. There are some classification methods, such as Naïve Bayes, K-Nearest Neighbor and Neural Network. Naïve Bayes classifies based on the probability value of the existing properties. K-Nearest Neighbor classifies based on the character of its nearest neighbor, where the number of neighbors=k, while Neural Network classifies based on human neural networks. This study will compare three classification methods for Seat Load Factor, which is the percentage of aircraft load, and also a measure in determining the profit of airline.. Affecting factors are the number of passengers, ticket prices, flight routes, and flight times. Based on the analysis with 47 data, it is known that the system of Naïve Bayes method has misclassifies in 14 data, so the accuracy rate is 70%. The system of K-Nearest Neighbor method with k=5 has misclassifies in 5 data, so the accuracy rate is 89%, and the Neural Network system has misclassifies in 10 data with accuracy rate 78%. The method with highest accuracy rate is the best method that will be used, which in this case is K-Nearest Neighbor method with success of classification system is 42 data, including 14 low, 10 medium, and 18 high value. Based on the best method, predictions can be made using new data, for example the new data consists of Bali flight routes (2), flight times in afternoon (2), estimate of passenger numbers is 140 people, and ticket prices is Rp.700,000. By using the K-Nearest Neighbor method, Seat Load Factor prediction is high or at intervals of 80% -100%.

Download Full-text

Entropy Based k Nearest Neighbor Pattern Classification (EbkNN): En-route to Achieving a High Accuracy in Breast Cancer Diagnosis

Asian Journal of Applied Sciences ◽

10.24203/ajas.v8i6.6386 ◽

2020 ◽

Vol 8 (6) ◽

Author(s):

Pushpam Sinha ◽

Ankita Sinha

Keyword(s):

Breast Cancer ◽

Pattern Classification ◽

Test Data ◽

Nearest Neighbor ◽

Training Dataset ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Test Dataset ◽

Data Points

Entropy based k-Nearest Neighbor pattern classification (EbkNN) is a variation of the conventional k-Nearest Neighbor rule of pattern classification, which exclusively optimizes the value of k-neighbors for each test data based on the calculations of entropy. The formula for entropy used in EbkNN is the one that has been defined popularly in information theory for a set of n different types of information (class) attached to a total of m objects (data points) with each object defined by f features. In EbkNN that value of k is chosen for discrimination of given test data for which the entropy is the least non-zero value. Other rules of conventional kNN are retained in EbkNN. It is concluded that EbkNN works best for binary classification. It is computationally prohibitive to use EbkNN for discriminating the data points of the test dataset into number of classes greater than two. The biggest advantage of EbkNN vis-à-vis the conventional kNN is that in one single run of EbkNN algorithm we get optimum classification of test data. But conventional kNN algorithm has to be run separately for each of the selected range of values of k, and then the optimum k to be chosen from amongst them. We also tested our EbkNN method on WDBC (Wisconsin Diagnostic Breast Cancer) dataset. There are 569 instances in this dataset and we made a random choice of first 290 instances as training dataset and the rest 279 instances as test dataset. We got an exceptionally remarkable result with EbkNN method- accuracy close to 100% and better than the ones got by most of the other researchers who worked on WDBC dataset.

Download Full-text

Comparative Study of Decision Tree, K-Nearest Neighbor, and Modified K-Nearest Neighbor on Jatropha Curcas Plant Disease Identification

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i1.1012 ◽

2020 ◽

pp. 55-60

Author(s):

Triando Hamonangan Saragih ◽

Diny Melsye Nurul Fajri ◽

Alfita Rakhmandasari

Keyword(s):

Expert System ◽

Decision Tree ◽

Comparative Study ◽

Jatropha Curcas ◽

Plant Disease ◽

Nearest Neighbor ◽

Classification Methods ◽

K Nearest Neighbor ◽

Disease Identification ◽

Accuracy Result

Jatropha Curcas is a very useful plant that can be used as a bio fuel for diesel engines replacing the coal. In Indonesia, there are few plantation that plant Jatropha Curcas. But there is so limited farmers that understand in detail about the disease of Jatropha Curcas and it may cause a big loss during harvesting when the disease occured with no further action. An expert system can help the farmers to identify the lant diseases of Jatropha Curcas. The objective of this research is to compare several identification and classification methods, such as Decision Tree, K-Nearest Neighbor and its modification. The comparison is based on the accuracy. Modified K-Nearest Neighbor method given the best accuracy result that is 67.74%.

Download Full-text

SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids

Pharmaceutics ◽

10.3390/pharmaceutics14010122 ◽

2022 ◽

Vol 14 (1) ◽

pp. 122

Author(s):

Phasit Charoenkwan ◽

Wararat Chiangjong ◽

Chanin Nantasenamat ◽

Mohammad Ali Moni ◽

Pietro Lio’ ◽

...

Keyword(s):

Amino Acids ◽

Propensity Scores ◽

Nearest Neighbor ◽

Biochemical Properties ◽

Least Squares Regression ◽

K Nearest Neighbor ◽

Accurate Identification ◽

Major Drawback ◽

Benchmark Datasets ◽

Tumor Homing

Tumor-homing peptides (THPs) are small peptides that can recognize and bind cancer cells specifically. To gain a better understanding of THPs’ functional mechanisms, the accurate identification and characterization of THPs is required. Although some computational methods for in silico THP identification have been proposed, a major drawback is their lack of model interpretability. In this study, we propose a new, simple and easily interpretable computational approach (called SCMTHP) for identifying and analyzing tumor-homing activities of peptides via the use of a scoring card method (SCM). To improve the predictability and interpretability of our predictor, we generated propensity scores of 20 amino acids as THPs. Finally, informative physicochemical properties were used for providing insights on characteristics giving rise to the bioactivity of THPs via the use of SCMTHP-derived propensity scores. Benchmarking experiments from independent test indicated that SCMTHP could achieve comparable performance to state-of-the-art method with accuracies of 0.827 and 0.798, respectively, when evaluated on two benchmark datasets consisting of Main and Small datasets. Furthermore, SCMTHP was found to outperform several well-known machine learning-based classifiers (e.g., decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes and partial least squares regression) as indicated by both 10-fold cross-validation and independent tests. Finally, the SCMTHP web server was established and made freely available online. SCMTHP is expected to be a useful tool for rapid and accurate identification of THPs and for providing better understanding on THP biophysical and biochemical properties.

Download Full-text