Link-Based Cluster Ensemble Method for Improved Meta-clustering Algorithm

Clustering is an important tool for data exploration. Several clustering algorithms exist, and new algorithms are frequently proposed in the literature. These algorithms have been very successful in a large number of real-world problems. However, there is no clustering algorithm, optimizing only a single criterion, able to reveal all types of structures (homogeneous or heterogeneous) present in a dataset. In order to deal with this problem, several multi-objective clustering and cluster ensemble methods have been proposed in the literature, including our multi-objective clustering ensemble algorithm. In this chapter, we present an overview of these methods, which, to a great extent, are based on the combination of various aspects of traditional clustering algorithms.

Download Full-text

Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell

Frontiers in Genetics ◽

10.3389/fgene.2020.604790 ◽

2020 ◽

Vol 11 ◽

Author(s):

Xiaoshu Zhu ◽

Jian Li ◽

Hong-Dong Li ◽

Miao Xie ◽

Jianxin Wang

Keyword(s):

Single Cell ◽

Graph Partitioning ◽

Biological Significance ◽

Ensemble Method ◽

Clustering Methods ◽

Sequencing Data ◽

Cluster Ensemble ◽

Ensemble Strategy ◽

Consensus Matrix ◽

The Individual

Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.

Download Full-text

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

BMC Bioinformatics ◽

10.1186/s12859-019-3179-5 ◽

2019 ◽

Vol 20 (S19) ◽

Cited By ~ 5

Author(s):

Thomas A. Geddes ◽

Taiyun Kim ◽

Lihao Nan ◽

James G. Burchfield ◽

Jean Y. H. Yang ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Clustering Algorithm ◽

Dimensional Space ◽

Clustering Algorithms ◽

Random Projection ◽

Computational Technique ◽

Cell Type ◽

Cluster Ensemble ◽

Cell Type Specific

Abstract Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. Results Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. Conclusions Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS

Download Full-text

A Novel Clustering Ensemble Method Based on One-Class Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.1943 ◽

2013 ◽

Vol 321-324 ◽

pp. 1943-1946

Author(s):

Lei Gu

Keyword(s):

Support Vector Machine ◽

Initial Data ◽

Clustering Algorithm ◽

Ensemble Method ◽

Support Vector ◽

Data Sets ◽

Clustering Ensemble ◽

Random Initial Data ◽

New Approach ◽

Vector Machines

A clustering algorithm based on one-class support vector machine has been proposed recently. Because the kernel technique is used, this approach can appear preferable to the traditional k-means clustering. Clustering ensemble method can combine several divisions of all unlabeled data into a single clustering to gain the better clustering results. In this paper, the clustering ensemble method is applied to the clustering algorithm based one-class support vector machines. Several partitions of multiple runs with different random initial data sets are combined into a final clustering result. Experiments show that the new approach can improve the clustering performance.

Download Full-text

A cluster ensemble method for clustering categorical data

Information Fusion ◽

10.1016/j.inffus.2004.03.001 ◽

2005 ◽

Vol 6 (2) ◽

pp. 143-151 ◽

Cited By ~ 48

Author(s):

Zengyou He ◽

Xiaofei Xu ◽

Shengchun Deng

Keyword(s):

Categorical Data ◽

Ensemble Method ◽

Cluster Ensemble

Download Full-text

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

10.1101/773903 ◽

2019 ◽

Author(s):

Thomas A Geddes ◽

Taiyun Kim ◽

Lihao Nan ◽

James G Burchfield ◽

Jean YH Yang ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Clustering Algorithm ◽

Dimensional Space ◽

Clustering Algorithms ◽

Evaluation Metrics ◽

Computational Technique ◽

Cell Type ◽

Cluster Ensemble ◽

Cell Type Specific

AbstractBackgroundSingle-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.ResultsHere, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets for generating clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metrics used.ConclusionsOur results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/autoencoder_cluster_ensemble

Download Full-text