The Clustering Algorithm Study of Gene Expression Data

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.183-185.93 ◽

2011 ◽

Vol 183-185 ◽

pp. 93-98

Author(s):

Rui He ◽

Chun Mei Lin

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Yeast Cell ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Undirected Graph ◽

Similarity Measurement ◽

Expression Data ◽

Clustering Method ◽

Self Organized

This paper proposes an evolutionary self-organized clustering method of genes based on undirected graph expression. In this method, we use the vertices of the graph to represent genes, and regard the weight between two vertices as similarity measurement of two genes. Thus, the similarities among genes can be extracted according to the space feature of graph with immune evolutionary method. To demonstrate the effectiveness of the proposed method, the method is tested on yeast cell cycle expression dataset; the results suggest that this method is capable of clustering genes.

Download Full-text

Incorporating Gene Ontology Information in Gene Expression Data Clustering Using Multiobjective Evolutionary Optimization: Application in Yeast Cell Cycle Data

Multi-Objective Optimization ◽

10.1007/978-981-13-1471-1_3 ◽

2018 ◽

pp. 55-78

Author(s):

Anirban Mukhopadhyay

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Gene Ontology ◽

Yeast Cell ◽

Gene Expression Data ◽

Data Clustering ◽

Yeast Cell Cycle ◽

Expression Data ◽

Gene Expression Data Clustering ◽

Gene Ontology Information

Download Full-text

ENTROPY-BASED CLUSTER VALIDATION AND ESTIMATION OF THE NUMBER OF CLUSTERS IN GENE EXPRESSION DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720012500114 ◽

2012 ◽

Vol 10 (05) ◽

pp. 1250011

Author(s):

NATALIA NOVOSELOVA ◽

IGOR TOM

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Selection Procedure ◽

Biological Knowledge ◽

Consensus Clustering ◽

Expression Data ◽

Cluster Validation ◽

Number Of Clusters ◽

Validity Measure

Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

Download Full-text

Kernel-Based Self-Organized Maps Trained with Supervised Bias bor Gene Expression Data Mining

Intelligent Knowledge-Based Systems ◽

10.1007/978-1-4020-7829-3_49 ◽

2005 ◽

pp. 1777-1793

Author(s):

Stergios Papadimitriou

Keyword(s):

Gene Expression ◽

Data Mining ◽

Gene Expression Data ◽

Expression Data ◽

Self Organized

Download Full-text

A New Two-steps Gene Expression Data Clustering Method

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2009.481 ◽

2009 ◽

Author(s):

Yanjie Zhang ◽

Veronique Prinet ◽

Shuanhu Wu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Data Clustering ◽

Expression Data ◽

Clustering Method ◽

Gene Expression Data Clustering

Download Full-text

Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints

BMC Bioinformatics ◽

10.1186/s12859-019-3231-5 ◽

2019 ◽

Vol 20 (S22) ◽

Author(s):

Juan Wang ◽

Cong-Hai Lu ◽

Jin-Xing Liu ◽

Ling-Yun Dai ◽

Xiang-Zhen Kong

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Low Rank ◽

Expression Data ◽

Geometrical Structures ◽

Graph Regularization ◽

Raw Data ◽

Clustering Quality ◽

Low Rank Representation

Abstract Background Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed. Results In this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed. Conclusions A series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.

Download Full-text

OverDBC: A new density-based clustering method with the ability of detecting overlapped clusters from gene expression data

Intelligent Data Analysis ◽

10.3233/ida-150784 ◽

2015 ◽

Vol 19 (6) ◽

pp. 1311-1321 ◽

Cited By ~ 2

Author(s):

Mansooreh Mirzaie ◽

Ahmad Barani ◽

Naser Nematbakkhsh ◽

Majid Beigi

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Clustering Method ◽

Density Based Clustering

Download Full-text

An optimal hierarchical clustering algorithm for gene expression data

Information Processing Letters ◽

10.1016/j.ipl.2004.11.001 ◽

2005 ◽

Vol 93 (3) ◽

pp. 143-147 ◽

Cited By ~ 13

Author(s):

Sudip Seal ◽

Srikanth Komarina ◽

Srinivas Aluru

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Expression Data ◽

Hierarchical Clustering Algorithm

Download Full-text

A modified QT-clustering algorithm over Gene Expression data

2012 1st International Conference on Recent Advances in Information Technology (RAIT) ◽

10.1109/rait.2012.6194618 ◽

2012 ◽

Cited By ~ 1

Author(s):

Nirupam Choudhury ◽

Rosy Sarmah ◽

Suranjon Sarma

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Expression Data

Download Full-text

CURVE-BASED CLUSTERING OF TIME COURSE GENE EXPRESSION DATA USING SELF-ORGANIZING MAPS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004291 ◽

2009 ◽

Vol 07 (04) ◽

pp. 645-661 ◽

Cited By ~ 11

Author(s):

XIN CHEN

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Regulatory Networks ◽

Time Course ◽

Clustering Algorithm ◽

Expression Patterns ◽

Self Organizing Map ◽

Expression Data ◽

Wide Range ◽

Self Organizing

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.

Download Full-text

Wavelet Packet Decomposition-Based Fuzzy Clustering Algorithm for Gene Expression Data

APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems ◽

10.1109/apccas.2006.342263 ◽

2006 ◽

Cited By ~ 4

Author(s):

Guangzhao Cui ◽

Xianghong Cao ◽

Yanfeng Wang ◽

Lingzhi Cao ◽

Buyi Huang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Wavelet Packet ◽

Wavelet Packet Decomposition ◽

Expression Data ◽

Fuzzy Clustering Algorithm

Download Full-text