Resampling-Based Similarity Measures for High-Dimensional Data

2015 ◽  
Vol 22 (1) ◽  
pp. 54-62 ◽  
Author(s):  
Dhammika Amaratunga ◽  
Javier Cabrera ◽  
Yung-Seop Lee
2014 ◽  
Vol 13 (03) ◽  
pp. 1450026 ◽  
Author(s):  
Hanan M. Alghamdi ◽  
Ali Selamat ◽  
Nor Shahriza Abdul Karim

In literature studies, high-dimensional data reduces the efficiency of clustering algorithms and maximises execution time. Therefore, in this paper, we propose an approach called a BV-kmeans (Bayesian Vectorisation along with k-means) that aims to improve document representation models for text clustering. This approach consists of integrating the k-means document clustering with the Bayesian Vectoriser that is used to compute the probability distribution of the documents in the vector space in order to overcome the problems of high-dimensional data and lower the consumption time. We have used various similarity measures which are namely: K divergence, Squared Euclidean distance and Squared χ2 distance in order to determine the effective metrics for modelling the similarity between documents with the proposed approach. We have evaluated the proposed approach on a set of common newspaper websites that have highly dimensional data. Experimental results show that the proposed approach can increase the degree to which a cluster encases documents from a specific category by 85%. This is in comparison with the standard k-means algorithm and it has succeeded in lowering the runtime using the proposed approach by 95% compared to the standard k-means algorithm.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

Author(s):  
Punit Rathore ◽  
James C. Bezdek ◽  
Dheeraj Kumar ◽  
Sutharshan Rajasegarar ◽  
Marimuthu Palaniswami

Symmetry ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 19
Author(s):  
Hsiuying Wang

High-dimensional data recognition problem based on the Gaussian Mixture model has useful applications in many area, such as audio signal recognition, image analysis, and biological evolution. The expectation-maximization algorithm is a popular approach to the derivation of the maximum likelihood estimators of the Gaussian mixture model (GMM). An alternative solution is to adopt a generalized Bayes estimator for parameter estimation. In this study, an estimator based on the generalized Bayes approach is established. A simulation study shows that the proposed approach has a performance competitive to that of the conventional method in high-dimensional Gaussian mixture model recognition. We use a musical data example to illustrate this recognition problem. Suppose that we have audio data of a piece of music and know that the music is from one of four compositions, but we do not know exactly which composition it comes from. The generalized Bayes method shows a higher average recognition rate than the conventional method. This result shows that the generalized Bayes method is a competitor to the conventional method in this real application.


Sign in / Sign up

Export Citation Format

Share Document