A new Kmeans clustering model and its generalization achieved by joint spectral embedding and rotation

Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301118 ◽

2019 ◽

Vol 33 ◽

pp. 118-125 ◽

Cited By ~ 2

Author(s):

Jun Guo ◽

Jiahui Ye

Keyword(s):

Spectral Clustering ◽

Time Complexity ◽

State Of The Art ◽

Research Problem ◽

Simple Approach ◽

Clustering Methods ◽

Real World Data ◽

The Past ◽

Benchmark Datasets ◽

In Virtue Of

Clustering on multi-view data has attracted much more attention in the past decades. Most previous studies assume that each instance appears in all views, or there is at least one view containing all instances. However, real world data often suffers from missing some instances in each view, leading to the research problem of partial multi-view clustering. To address this issue, this paper proposes a simple yet effective Anchorbased Partial Multi-view Clustering (APMC) method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering. APMC is conceptually simple and easy to implement in practice, besides it has clear intuitions and non-trivial empirical guarantees. Specifically, APMC firstly integrates intra- and inter- view similarities through anchors. Then, spectral clustering is performed on the fused similarities to obtain a unified clustering result. Compared with existing partial multi-view clustering methods, APMC has three notable advantages: 1) it can capture more non-linear relations among instances with the help of kernel-based similarities; 2) it has a much lower time complexity in virtue of a noniterative scheme; 3) it can inherently handle data with negative entries as well as be extended to more than two views. Finally, we extensively evaluate the proposed method on five benchmark datasets. Experimental results demonstrate the superiority of APMC over state-of-the-art approaches.

Download Full-text

Deep Clustering with Self-supervision using Pairwise Data Similarities

10.36227/techrxiv.14852652 ◽

2021 ◽

Author(s):

Mohammadreza Sadeghi ◽

Naeges Armanfard

Keyword(s):

Dimensional Space ◽

Second Phase ◽

Similar Data ◽

Clustering Methods ◽

Latent Space ◽

Benchmark Datasets ◽

Data Points ◽

Common Group ◽

Fully Connected ◽

Group Center

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. Most of the existing methods try to group similar data points through simultaneously minimizing clustering and reconstruction losses, employing an autoencoder (AE). However, they all ignore the relevant useful information available within pairwise data relationships. In this paper we propose a novel deep clustering framework with self-supervision using pairwise data similarities (DCSS). The proposed method consists of two successive phases. First, we propose a novel AE-based approach that aims to aggregate similar data points near a common group center in the latent space of an AE. The AE's latent space is obtained by minimizing weighted reconstruction and centering losses of data points, where weights are defined based on similarity of data points and group centers. In the second phase, we map the AE's latent space, using a fully connected network MNet, onto a K-dimensional space used to derive the final data cluster assignments, where K is the number of clusters. MNet is trained to strengthen (weaken) similarity of similar (dissimilar) samples. Experimental results on multiple benchmark datasets demonstrate the effectiveness of DCSS for data clustering and as a general framework for boosting up state-of-the-art clustering methods.

Download Full-text

Spectral Nonlinearly Embedded Clustering Algorithm

Mathematical Problems in Engineering ◽

10.1155/2016/9264561 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9

Author(s):

Mingming Liu ◽

Bing Liu ◽

Chen Zhang ◽

Wei Sun

Keyword(s):

Clustering Algorithm ◽

High Dimensional ◽

Extension Problem ◽

Clustering Methods ◽

Cluster Assignment ◽

Density Region ◽

Out Of Sample ◽

Benchmark Datasets ◽

Low Dimensionality ◽

Low Dimensional

As is well known, traditional spectral clustering (SC) methods are developed based on themanifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. But, for some high-dimensional and sparse data, such an assumption might be invalid. Consequently, the clustering performance of SC will be degraded sharply in this case. To solve this problem, in this paper, we propose a general spectral embedded framework, which embeds the true cluster assignment matrix for high-dimensional data into a nonlinear space by a predefined embedding function. Based on this framework, several algorithms are presented by using different embedding functions, which aim at learning the final cluster assignment matrix and a transformation into a low dimensionality space simultaneously. More importantly, the proposed method can naturally handle the out-of-sample extension problem. The experimental results on benchmark datasets demonstrate that the proposed method significantly outperforms existing clustering methods.

Download Full-text

Deep Clustering with Self-supervision using Pairwise Data Similarities

10.36227/techrxiv.14852652.v1 ◽

2021 ◽

Author(s):

Mohammadreza Sadeghi ◽

Naeges Armanfard

Keyword(s):

Dimensional Space ◽

Second Phase ◽

Similar Data ◽

Clustering Methods ◽

Latent Space ◽

Benchmark Datasets ◽

Data Points ◽

Common Group ◽

Fully Connected ◽

Group Center

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. Most of the existing methods try to group similar data points through simultaneously minimizing clustering and reconstruction losses, employing an autoencoder (AE). However, they all ignore the relevant useful information available within pairwise data relationships. In this paper we propose a novel deep clustering framework with self-supervision using pairwise data similarities (DCSS). The proposed method consists of two successive phases. First, we propose a novel AE-based approach that aims to aggregate similar data points near a common group center in the latent space of an AE. The AE's latent space is obtained by minimizing weighted reconstruction and centering losses of data points, where weights are defined based on similarity of data points and group centers. In the second phase, we map the AE's latent space, using a fully connected network MNet, onto a K-dimensional space used to derive the final data cluster assignments, where K is the number of clusters. MNet is trained to strengthen (weaken) similarity of similar (dissimilar) samples. Experimental results on multiple benchmark datasets demonstrate the effectiveness of DCSS for data clustering and as a general framework for boosting up state-of-the-art clustering methods.

Download Full-text

The Impact of Random Models on Clustering Similarity

10.1101/196840 ◽

2017 ◽

Cited By ~ 4

Author(s):

Alexander J. Gates ◽

Yong-Yeol Ahn

Keyword(s):

Handwriting Recognition ◽

Similarity Measures ◽

Fixed Number ◽

Consensus Clustering ◽

Clustering Methods ◽

Clustering Model ◽

Clustering Ensembles ◽

Random Models ◽

The Impact ◽

Clustering Similarity

AbstractClustering is a central approach for unsupervised learning. After clustering is applied, the most fundamental analysis is to quantitatively compare clusterings. Such comparisons are crucial for the evaluation of clustering methods as well as other tasks such as consensus clustering. It is often argued that, in order to establish a baseline, clustering similarity should be assessed in the context of a random ensemble of clusterings. The prevailing assumption for the random clustering ensemble is the permutation model in which the number and sizes of clusters are fixed. However, this assumption does not necessarily hold in practice; for example, multiple runs of K-means clustering returns clusterings with a fixed number of clusters, while the cluster size distribution varies greatly. Here, we derive corrected variants of two clustering similarity measures (the Rand index and Mutual Information) in the context of two random clustering ensembles in which the number and sizes of clusters vary. In addition, we study the impact of one-sided comparisons in the scenario with a reference clustering. The consequences of different random models are illustrated using synthetic examples, handwriting recognition, and gene expression data. We demonstrate that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choice of random clustering model should be carefully justified.

Download Full-text

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Multimedia Tools and Applications ◽

10.1007/s11042-021-10594-9 ◽

2021 ◽

Author(s):

Himanshu Mittal ◽

Avinash Chandra Pandey ◽

Mukesh Saraswat ◽

Sumit Kumar ◽

Raju Pal ◽

...

Keyword(s):

Image Segmentation ◽

Performance Parameters ◽

Clustering Methods ◽

Benchmark Datasets ◽

Comprehensive Survey

Download Full-text

A Weighted Kernel PCA Formulation with Out-of-Sample Extensions for Spectral Clustering Methods

The 2006 IEEE International Joint Conference on Neural Network Proceedings ◽

10.1109/ijcnn.2006.1716082 ◽

2006 ◽

Cited By ~ 4

Author(s):

C. Alzate ◽

J.A.K. Suykens

Keyword(s):

Spectral Clustering ◽

Kernel Pca ◽

Clustering Methods ◽

Out Of Sample ◽

Weighted Kernel

Download Full-text

Introducing A Hybrid Data Mining Model to Evaluate Customer Loyalty

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.741 ◽

2016 ◽

Vol 6 (6) ◽

pp. 1235-1240

Author(s):

H. Alizadeh ◽

B. Minaei Bidgoli

Keyword(s):

Customer Loyalty ◽

Accurate Method ◽

Support Vector ◽

Similar Data ◽

Classification Methods ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Step Model ◽

Customer Classification ◽

Loyal Customers

The main aim of this study was introducing a comprehensive model of bank customers᾽ loyalty evaluation based on the assessment and comparison of different clustering methods᾽ performance. This study also pursues the following specific objectives: a) using different clustering methods and comparing them for customer classification, b) finding the effective variables in determining the customer loyalty, and c) using different collective classification methods to increase the modeling accuracy and comparing the results with the basic methods. Since loyal customers generate more profit, this study aims at introducing a two-step model for classification of customers and their loyalty. For this purpose, various methods of clustering such as K-medoids, X-means and K-means were used, the last of which outperformed the other two through comparing with Davis-Bouldin index. Customers were clustered by using K-means and members of these four clusters were analyzed and labeled. Then, a predictive model was run based on demographic variables of customers using various classification methods such as DT (Decision Tree), ANN (Artificial Neural Networks), NB (Naive Bayes), KNN (K-Nearest Neighbors) and SVM (Support Vector Machine), as well as their bagging and boosting to predict the class of loyal customers. The results showed that the bagging-ANN was the most accurate method in predicting loyal customers. This two-stage model can be used in banks and financial institutions with similar data to identify the type of future customers.

Download Full-text

Using Clustering as a Tool: Mixed Methods in Qualitative Data Analysis

The Qualitative Report ◽

10.46743/2160-3715/2015.2201 ◽

2015 ◽

Author(s):

Laura Macia

Keyword(s):

Cluster Analysis ◽

Mixed Methods ◽

Data Analysis ◽

Qualitative Data ◽

Similarity Measures ◽

Cluster Solution ◽

Clustering Methods ◽

Qualitative Data Analysis ◽

Research Project ◽

Detailed Explanation

In this article I discuss cluster analysis as an exploratory tool to support the identification of associations within qualitative data. While not appropriate for all qualitative projects, cluster analysis can be particularly helpful in identifying patterns where numerous cases are studied. I use as illustration a research project on Latino grievances to offer a detailed explanation of the main steps in cluster analysis, providing specific considerations for its use with qualitative data. I specifically describe the issues of data transformation, the choice of clustering methods and similarity measures, the identification of a cluster solution, and the interpretation of the data in a qualitative context.

Download Full-text

Computer-Aided Teaching System Based on Data Mining

Wireless Communications and Mobile Computing ◽

10.1155/2021/3373535 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yonghua Tang ◽

Qiang Fan ◽

Peng Liu

Keyword(s):

Data Mining ◽

Spectral Clustering ◽

Traditional Teaching ◽

Clustering Methods ◽

Mining System ◽

Teaching Resources ◽

Teaching System ◽

Data Mining Algorithms ◽

Computer Aided ◽

Mining Algorithms

The traditional teaching model cannot adapt to the teaching needs of the era of smart teaching. Based on this, this paper combines data mining technology to carry out teaching reforms, constructs a computer-aided system based on data mining, and constructs teaching system functions based on actual conditions. The constructed system can carry out multisubject teaching. Moreover, this paper uses a data mining system to mine teaching resources and uses spectral clustering methods to integrate multiple teaching resources to improve the practicability of data mining algorithms. In addition, this paper combines digital technology to deal with teaching resources. Finally, after building the system, this paper designs experiments to verify the performance of the system. From the research results, it can be seen that the system constructed in this paper has certain teaching and practical effects, and it can be applied to a larger teaching scope in subsequent research.

Download Full-text