scholarly journals Clustering using kernel entropy principal component analysis and variable kernel estimator

Author(s):  
Loubna El Fattahi ◽  
El Hassan Sbai

Clustering as unsupervised learning method is the mission of dividing data objects into clusters with common characteristics. In the present paper, we introduce an enhanced technique of the existing EPCA data transformation method. Incorporating the kernel function into the EPCA, the input space can be mapped implicitly into a high-dimensional of feature space. Then, the Shannon’s entropy estimated via the inertia provided by the contribution of every mapped object in data is the key measure to determine the optimal extracted features space. Our proposed method performs very well the clustering algorithm of the fast search of clusters’ centers based on the local densities’ computing. Experimental results disclose that the approach is feasible and efficient on the performance query.

Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.


2019 ◽  
Vol 9 (17) ◽  
pp. 3566 ◽  
Author(s):  
Francisco-Manuel Melgarejo-Meseguer ◽  
Francisco-Javier Gimeno-Blanes ◽  
María-Eladia Salar-Alcaraz ◽  
Juan-Ramón Gimeno-Blanes ◽  
Juan Martínez-Sánchez ◽  
...  

Recent research has proven the existence of statistical relation among fragmented QRS and several highly prevalence diseases, such as cardiac sarcoidosis, acute coronary syndrome, arrythmogenic cardiomyopathies, Brugada syndrome, and hypertrophic cardiomyopathy. One out of five hundred people suffer from hypertrophic cardiomyopathies. The relation among the fragmentation and arrhythmias drives the objective of this work, which is to propose a valid method for QRS fragmentation detection. With that aim, we followed a two-stage approach. First, we identified the features that better characterize the fragmentation by analyzing the physiological interpretation of multivariate approaches, such as principal component analysis (PCA) and independent component analysis (ICA). Second, we created an invariant transformation method for the multilead electrocardiogram (ECG), by scrutinizing the statistical distributions of the PCA eigenvectors and of the ICA transformation arrays, in order to anchor the desired elements in the suitable leads in the feature space. A complete database was compounded incorporating real fragmented ECGs, surrogate registers by synthetically adding fragmented activity to real non-fragmented ECG registers, and standard clean ECGs. Results showed that the creation of beat templates together with the application of PCA over eight independent leads achieves 0.995 fragmentation enhancement ratio and 0.07 dispersion coefficient. In the case of ICA over twelve leads, the results were 0.995 fragmentation enhancement ratio and 0.70 dispersion coefficient. We conclude that the algorithm presented in this work constructs a new paradigm, by creating a systematic and powerful tool for clinical anamnesis and evaluation based on multilead ECG. This approach consistently consolidates the inconspicuous elements present in multiple leads onto designated variables in the output space, hence offering additional and valid visual and non-visual information to standard clinical review, and opening the door to a more accurate automatic detection and statistically valid systematic approach for a wide number of applications. In this direction and within the companion paper, further developments are presented applying this technique to fragmentation detection.


Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 163
Author(s):  
Baobin Duan ◽  
Lixin Han ◽  
Zhinan Gou ◽  
Yi Yang ◽  
Shuangshuang Chen

With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.


Author(s):  
CHANG WOO LEE ◽  
HYUN KANG ◽  
HANG JOON KIM ◽  
KEECHUL JUNG

The current paper proposes a font classification method for document images that uses non-negative matrix factorization (NMF), that is able to learn part-based representations of objects. The basic idea of the proposed method is based on the fact that the characteristics of each font are derived from parts of individual characters in each font rather than holistic textures. Spatial localities, parts composing of font images, are automatically extracted using NMF and, then, used as features representing each font. Using hierarchical clustering algorithm, these feature sets are generalized for font classification, resulting in the prototype templates construction. In both the prototype construction and font classification, earth mover's distance (EMD) is used as the distance metric, which is more suitable for the NMF feature space than Cosine or Euclidean distance. In the experimental results, the distribution of features and the appropriateness of the features specifying each font are investigated, and the results are compared with a related algorithm: principal component analysis (PCA). The proposed method is expected to improve the performance of optical character recognition (OCR), document indexing and retrieval systems, when such systems adopt a font classifier as a preprocessor.


Author(s):  
Md Kamal Uddin ◽  
◽  
Amran Bhuiyan ◽  
Mahmudul Hasan ◽  
◽  
...  

In the driving field of computer vision, re-identification of an individual in a camera network is very challenging task. Existing methods mainly focus on strategies based on feature learning, which provide feature space and force the same person to be closer than separate individuals. These methods rely to a large extent on high-dimensional feature vectors to achieve high re-identification accuracy. Due to computational cost and efficiency, they are difficult to achieve in practical applications. We comprehensively analyzed the effect of kernel-based principal component analysis (PCA) on some existing high-dimensional person re-identification feature extractors to solve these problems. We initially formulate a kernel function on the extracted features and then apply PCA, significantly reducing the feature dimension. After that, we have proved that the kernel is very effective on different state-of-the-art high-dimensional feature descriptors. Finally, a thorough experimental evaluation of the reference person re-identification data set determined that the prediction method was significantly superior to more advanced techniques and computationally feasible.


2018 ◽  
Vol 7 (2.11) ◽  
pp. 27 ◽  
Author(s):  
Kahkashan Kouser ◽  
Amrita Priyam

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.    Finally, the result of GA-HDClustering  is  compared  with  the  traditional  clustering  algorithm.  For comparison different validity  matrices  such  as  Sum  of  squared  error  (SSE),  Within  Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),   are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm. 


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Loubna El Fattahi ◽  
El Hassan Sbai

In the present study, we introduce a new approach for the nonlinear monitoring process based on kernel entropy principal component analysis (KEPCA) and the notion of inertia. KEPCA plays double roles. First, it reduces the data in the high-dimensional space. Second, it constructs the model. Before data reduction, KEPCA transforms input data into high-dimensional feature space based on a nonlinear kernel function and automatically determines the number of principal components (PCs) based on the computation of the inertia. The retained PCs express the maximum inertia entropy of data in the feature space. Then, we use the Parzen window estimator to compute the upper control limit (UCL) for inertia-based KEPCA instead of the Gaussian assumption. Our second contribution concerns a new combined index based on the monitoring indices T2 and SPE in order to simplify the detection task of the fault and prevent any confusion. The proposed approaches have been applied to process fault detection and diagnosis for the well-known benchmark Tennessee Eastman process (TE). Results were performing.


2021 ◽  
Vol 12 (2) ◽  
pp. 144-148
Author(s):  
D. Usman ◽  
S.F. Sani

Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. Every clustering method is based on the index of similarity or dissimilarity between data points. However, the true intrinsic structure of the data could be correctly described by the similarity formula defined and embedded in the clustering criterion function. This paper uses squared Euclidean distance and Manhattan distance to investigates the best method for measuring similarity between data objects in sparse and high-dimensional domain which is fast, capable of providing high quality clustering result and consistent. The performances of these two methods were reported with simulated high dimensional datasets.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

Sign in / Sign up

Export Citation Format

Share Document