Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth

Author(s):  
Miao Wang ◽  
Xuequn Shang ◽  
Shaohua Zhang ◽  
Zhanhuai Li

DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.

Author(s):  
Miao Wang ◽  
Xuequn Shang ◽  
Shaohua Zhang ◽  
Zhanhuai Li

DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.


At present, triclustering is the well known data mining technique for analysis of 3D gene expression data (GST). Triclustering is a simultaneously clustering of subset of Gene (G), subset of Sample (S), and over a subset of Time point (T). Triclustering approach identifies a coherent pattern in the 3D gene expression data using Mean Correlation Value (MCV). In this chapter, Hybrid PSO based algorithm is developed for triclustering of 3D gene expression data. This algorithm can effectively find the coherent pattern with high volume of a tricluster. The experimental study is conducted on yeast cycle dataset to study the biological significance of the coherent tricluster using gene ontology tool


Author(s):  
Soumya Raychaudhuri

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 4288-4288
Author(s):  
Marta Campo ◽  
Andrea Zangrando ◽  
Luca Trentin ◽  
Rui Li ◽  
Wei-min Liu ◽  
...  

Abstract Gene expression microarrays had been used to classify known tumor types and various hematological malignancies (Yeoh et al, Cancer Cell 2002; Kohlmann et al, Genes Chromosomes Cancer 2003), enforcing the objective that microarray analysis could be introduced soon in the routine classification of cancer (Haferlach et al, Blood 2005). However, there’re still doubts about gene expression experiments performance in clinical laboratory diagnosis. For instance, the quality of starting material is a major concern in microarray technology and there are no data on the variation in gene expression profiles ensuing from different RNA extraction procedures. Here, as part of the internal multicenter MILE Study program, we assess the impact of different RNA preparation methods on gene expression data, analyzing 27 patients representative of nine different subtypes of pediatric acute leukemias. We compared the three currently most used protocols to isolate RNA for routine diagnosis (PCR assays) and microarray experiments. They are named as method A: lysis of mononuclear leukemia cells, followed by lysate homogeniziation, followed by total RNA isolation; method B: TRIzol RNA isolation, and method C: TRIzol RNA isolation followed by total RNA purification step. The methods were analyzed in triplicates for each sample (24) and additional three samples were performed in technical replicates of three data sets for each preparation (HG-U133 Plus 2.0). Method A results in better total RNA quality as demonstrated by 3′/5′ GAPD ratios and by RNA degradation plots. High comparability of gene expression data is found between samples in the same leukemia subclasses and collected with different RNA preparation methods thus demonstrating that sample preparation procedures do not impair the overall signal distribution. Unsupervised analyses showed clustering of samples first by each patient’s replicate conditions, then by leukemia type, and finally by leukemia lineage. In fact, B-ALL samples are clustered together, separately from T-ALL and AML, demonstrating that clustering reflects biological differences between leukemias and that the RNA isolation method is a secondary effect. Also, supervised cluster analyses highlight that samples are grouped depending on intra-lineage features (i.e. chromosomal aberrations) thus confirming the clustering organizations as reported in recent gene expression profiling studies of acute leukemias. Our study shows that biological features of pediatric acute leukemia classes largely exceed the variations between different total RNA sample preparation protocols. However, technical replicates analyses reveal that gene expression data from method A have the lowest degree of variation, are more reproducible and more precise as compared to the other two methods. Furthermore, compared to methods B and C, method A produces more differentially expressed probe sets between distinct leukemia classes and is therefore considered the more robust RNA isolation procedure for gene expression experiments using high-density microarray technology. We therefore conclude that method A (initial homogenization of the leukemia cell lysate followed by total RNA isolation) combined with a standardized microarray analysis protocol is highly reproducible and contributes to robustness of gene expression data and that this procedure is most practical for a routine laboratory use.


Sign in / Sign up

Export Citation Format

Share Document