HYPERSPHERICAL PROTOTYPES FOR PATTERN CLASSIFICATION

Author(s):  
HATEM A. FAYED ◽  
AMIR F. ATIYA ◽  
SHERIF M. R. HASHEM

The nearest neighbor method is one of the most widely used pattern classification methods. However its major drawback in practice is the curse of dimensionality. In this paper, we propose a new method to alleviate this problem significantly. In this method, we attempt to cover the training patterns of each class with a number of hyperspheres. The method attempts to design hyperspheres as compact as possible, and we pose this as a quadratic optimization problem. We performed several simulation experiments, and found that the proposed approach results in considerable speed-up over the k-nearest-neighbor method while maintaining the same level of accuray. It also significantly beats other prototype classification methods (Like LVQ, RCE and CCCD) in most performance aspects.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jing Tian ◽  
Jianping Zhao ◽  
Chunhou Zheng

Abstract Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.


2019 ◽  
Vol 29 (2) ◽  
pp. 393-405 ◽  
Author(s):  
Magdalena Piotrowska ◽  
Gražina Korvel ◽  
Bożena Kostek ◽  
Tomasz Ciszewski ◽  
Andrzej Cżyzewski

Abstract Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and self-organizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.


Teknik ◽  
2021 ◽  
Vol 42 (2) ◽  
pp. 137-148
Author(s):  
Vincentius Abdi Gunawan ◽  
Leonardus Sandy Ade Putra

Communication is essential in conveying information from one individual to another. However, not all individuals in the world can communicate verbally. According to WHO, deafness is a hearing loss that affects 466 million people globally, and 34 million are children. So it is necessary to have a non-verbal language learning method for someone who has hearing problems. The purpose of this study is to build a system that can identify non-verbal language so that it can be easily understood in real-time. A high success rate in the system needs a proper method to be applied in the system, such as machine learning supported by wavelet feature extraction and different classification methods in image processing. Machine learning was applied in the system because of its ability to recognize and compare the classification results in four different methods. The four classifications used to compare the hand gesture recognition from American Sign Language are the Multi-Class SVM classification, Backpropagation Neural Network Backpropagation, K - Nearest Neighbor (K-NN), and Naïve Bayes. The simulation test of the four classification methods that have been carried out obtained success rates of 99.3%, 98.28%, 97.7%, and 95.98%, respectively. So it can be concluded that the classification method using the Multi-Class SVM has the highest success rate in the introduction of American Sign Language, which reaches 99.3%. The whole system is designed and tested using MATLAB as supporting software and data processing.


Author(s):  
Alia Karim Abdul Hassan ◽  
Bashar Saadoon Mahdi ◽  
Asmaa Abdullah Mohammed

In a writer recognition system, the system performs a “one-to-many” search in a large database with handwriting samples of known authors and returns a possible candidate list. This paper proposes method for writer identification handwritten Arabic word without segmentation to sub letters based on feature extraction speed up robust feature transform (SURF) and K nearest neighbor classification (KNN) to enhance the writer's  identification accuracy. After feature extraction, it can be cluster by K-means algorithm to standardize the number of features. The feature extraction and feature clustering called to gather Bag of Word (BOW); it converts arbitrary number of image feature to uniform length feature vector. The proposed method experimented using (IFN/ENIT) database. The recognition rate of experiment result is (96.666).


2019 ◽  
Vol 16 (2) ◽  
pp. 187
Author(s):  
Mega Luna Suliztia ◽  
Achmad Fauzan

Classification is the process of grouping data based on observed variables to predict new data whose class is unknown. There are some classification methods, such as Naïve Bayes, K-Nearest Neighbor and Neural Network. Naïve Bayes classifies based on the probability value of the existing properties. K-Nearest Neighbor classifies based on the character of its nearest neighbor, where the number of neighbors=k, while Neural Network classifies based on human neural networks. This study will compare three classification methods for Seat Load Factor, which is the percentage of aircraft load, and also a measure in determining the profit of airline.. Affecting factors are the number of passengers, ticket prices, flight routes, and flight times. Based on the analysis with 47 data, it is known that the system of Naïve Bayes method has misclassifies in 14 data, so the accuracy rate is 70%. The system of K-Nearest Neighbor method with k=5 has misclassifies in 5 data, so the accuracy rate is 89%, and the Neural Network system has misclassifies in 10 data with accuracy rate 78%. The method with highest accuracy rate is the best method that will be used, which in this case is K-Nearest Neighbor method with success of classification system is 42 data, including 14 low, 10 medium, and 18 high value. Based on the best method, predictions can be made using new data, for example the new data consists of Bali flight routes (2), flight times in afternoon (2), estimate of passenger numbers is 140 people, and ticket prices is Rp.700,000. By using the K-Nearest Neighbor method, Seat Load Factor prediction is high or at intervals of 80% -100%.


2020 ◽  
Vol 8 (6) ◽  
Author(s):  
Pushpam Sinha ◽  
Ankita Sinha

Entropy based k-Nearest Neighbor pattern classification (EbkNN) is a variation of the conventional k-Nearest Neighbor rule of pattern classification, which exclusively optimizes the value of k-neighbors for each test data based on the calculations of entropy. The formula for entropy used in EbkNN is the one that has been defined popularly in information theory for a set of n different types of information (class) attached to a total of m objects (data points) with each object defined by f features. In EbkNN that value of k is chosen for discrimination of given test data for which the entropy is the least non-zero value. Other rules of conventional kNN are retained in EbkNN. It is concluded that EbkNN works best for binary classification. It is computationally prohibitive to use EbkNN for discriminating the data points of the test dataset into number of classes greater than two. The biggest advantage of EbkNN vis-à-vis the conventional kNN is that in one single run of EbkNN algorithm we get optimum classification of test data. But conventional kNN algorithm has to be run separately for each of the selected range of values of k, and then the optimum k to be chosen from amongst them. We also tested our EbkNN method on WDBC (Wisconsin Diagnostic Breast Cancer) dataset. There are 569 instances in this dataset and we made a random choice of first 290 instances as training dataset and the rest 279 instances as test dataset. We got an exceptionally remarkable result with EbkNN method- accuracy close to 100% and better than the ones got by most of the other researchers who worked on WDBC dataset.  


Author(s):  
Triando Hamonangan Saragih ◽  
Diny Melsye Nurul Fajri ◽  
Alfita Rakhmandasari

Jatropha Curcas is a very useful plant that can be used as a bio fuel for diesel engines replacing the coal. In Indonesia, there are few plantation that plant Jatropha Curcas. But there is so limited farmers that understand in detail about the disease of Jatropha Curcas and it may cause a big loss during harvesting when the disease occured with no further action. An expert system can help the farmers to identify the lant diseases of Jatropha Curcas. The objective of this research is to compare several identification and classification methods, such as Decision Tree, K-Nearest Neighbor and its modification. The comparison is based on the accuracy. Modified K-Nearest Neighbor method given the best accuracy result that is 67.74%.


Pharmaceutics ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 122
Author(s):  
Phasit Charoenkwan ◽  
Wararat Chiangjong ◽  
Chanin Nantasenamat ◽  
Mohammad Ali Moni ◽  
Pietro Lio’ ◽  
...  

Tumor-homing peptides (THPs) are small peptides that can recognize and bind cancer cells specifically. To gain a better understanding of THPs’ functional mechanisms, the accurate identification and characterization of THPs is required. Although some computational methods for in silico THP identification have been proposed, a major drawback is their lack of model interpretability. In this study, we propose a new, simple and easily interpretable computational approach (called SCMTHP) for identifying and analyzing tumor-homing activities of peptides via the use of a scoring card method (SCM). To improve the predictability and interpretability of our predictor, we generated propensity scores of 20 amino acids as THPs. Finally, informative physicochemical properties were used for providing insights on characteristics giving rise to the bioactivity of THPs via the use of SCMTHP-derived propensity scores. Benchmarking experiments from independent test indicated that SCMTHP could achieve comparable performance to state-of-the-art method with accuracies of 0.827 and 0.798, respectively, when evaluated on two benchmark datasets consisting of Main and Small datasets. Furthermore, SCMTHP was found to outperform several well-known machine learning-based classifiers (e.g., decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes and partial least squares regression) as indicated by both 10-fold cross-validation and independent tests. Finally, the SCMTHP web server was established and made freely available online. SCMTHP is expected to be a useful tool for rapid and accurate identification of THPs and for providing better understanding on THP biophysical and biochemical properties.


Sign in / Sign up

Export Citation Format

Share Document