Affinity Propagation Clustering Algorithm Based on PCA

2014 ◽  
Vol 590 ◽  
pp. 688-692
Author(s):  
Bei Chen ◽  
Kun Song

Overlap information usually exits in the high-dimensional data. Misclassified points may be more when affinity propagation clustering is applied to these data. Concerning this problem, a new method combining principal components analysis and affinity propagation clustering is proposed. In this method, dimensionality of the original data is reduced on the premise of reserving most information of the variables. Then, affinity propagation clustering is implemented in the low-dimensional space. Thus, because the redundant information is deleted, the classification is accurate. Experiment is done by using this new method, the results of the experiment explain that this method is effective.

Author(s):  
Jing Wang ◽  
Jinglin Zhou ◽  
Xiaolu Chen

AbstractIndustrial data variables show obvious high dimension and strong nonlinear correlation. Traditional multivariate statistical monitoring methods, such as PCA, PLS, CCA, and FDA, are only suitable for solving the high-dimensional data processing with linear correlation. The kernel mapping method is the most common technique to deal with the nonlinearity, which projects the original data in the low-dimensional space to the high-dimensional space through appropriate kernel functions so as to achieve the goal of linear separability in the new space. However, the space projection from the low dimension to the high dimension is contradictory to the actual requirement of dimensionality reduction of the data. So kernel-based method inevitably increases the complexity of data processing.


Author(s):  
MIAO CHENG ◽  
BIN FANG ◽  
YUAN YAN TANG ◽  
HENGXIN CHEN

Many problems in pattern classification and feature extraction involve dimensionality reduction as a necessary processing. Traditional manifold learning algorithms, such as ISOMAP, LLE, and Laplacian Eigenmap, seek the low-dimensional manifold in an unsupervised way, while the local discriminant analysis methods identify the underlying supervised submanifold structures. In addition, it has been well-known that the intraclass null subspace contains the most discriminative information if the original data exist in a high-dimensional space. In this paper, we seek for the local null space in accordance with the null space LDA (NLDA) approach and reveal that its computational expense mainly depends on the quantity of connected edges in graphs, which may be still unacceptable if a great deal of samples are involved. To address this limitation, an improved local null space algorithm is proposed to employ the penalty subspace to approximate the local discriminant subspace. Compared with the traditional approach, the proposed method can achieve more efficiency so that the overload problem is avoided, while slight discriminant power is lost theoretically. A comparative study on classification shows that the performance of the approximative algorithm is quite close to the genuine one.


2020 ◽  
Vol 0 (10/2019) ◽  
pp. 25-29
Author(s):  
Chung Tran ◽  
Andrzej Ameljańczyk

The paper presents a proposal of a new method for clustering search results. The method uses an external knowledge resource, which can be, for example, Wikipedia. Wikipedia – the largest encyclopedia, is a free and popular knowledge resource which is used to extract topics from short texts. Similarities between documents are calculated based on the similarities between these topics. After that, affinity propagation clustering algorithm is employed to cluster web search results. Proposed method is tested by AMBIENT dataset and evaluated within the experimental framework provided by a SemEval-2013 task. The paper also suggests new method to compare global performance of algorithms using multi – criteria analysis.


2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Akira Imakura ◽  
Momo Matsuda ◽  
Xiucai Ye ◽  
Tetsuya Sakurai

Dimensionality reduction methods that project highdimensional data to a low-dimensional space by matrix trace optimization are widely used for clustering and classification. The matrix trace optimization problem leads to an eigenvalue problem for a low-dimensional subspace construction, preserving certain properties of the original data. However, most of the existing methods use only a few eigenvectors to construct the low-dimensional space, which may lead to a loss of useful information for achieving successful classification. Herein, to overcome the deficiency of the information loss, we propose a novel complex moment-based supervised eigenmap including multiple eigenvectors for dimensionality reduction. Furthermore, the proposed method provides a general formulation for matrix trace optimization methods to incorporate with ridge regression, which models the linear dependency between covariate variables and univariate labels. To reduce the computational complexity, we also propose an efficient and parallel implementation of the proposed method. Numerical experiments indicate that the proposed method is competitive compared with the existing dimensionality reduction methods for the recognition performance. Additionally, the proposed method exhibits high parallel efficiency.


Sign in / Sign up

Export Citation Format

Share Document