CLUSTERING AND RE-CLUSTERING FOR PATTERN DISCOVERY IN GENE EXPRESSION DATA

2005 ◽  
Vol 03 (02) ◽  
pp. 281-301 ◽  
Author(s):  
PATRICK C. H. MA ◽  
KEITH C. C. CHAN ◽  
DAVID K. Y. CHIU

The combined interpretation of gene expression data and gene sequences is important for the investigation of the intricate relationships of gene expression at the transcription level. The expression data produced by microarray hybridization experiments can lead to the identification of clusters of co-expressed genes that are likely co-regulated by the same regulatory mechanisms. By analyzing the promoter regions of co-expressed genes, the common regulatory patterns characterized by transcription factor binding sites can be revealed. Many clustering algorithms have been used to uncover inherent clusters in gene expression data. In this paper, based on experiments using simulated and real data, we show that the performance of these algorithms could be further improved. For the clustering of expression data typically characterized by a lot of noise, we propose to use a two-phase clustering algorithm consisting of an initial clustering phase and a second re-clustering phase. The proposed algorithm has several desirable features: (i) it utilizes both local and global information by computing both a "local" pairwise distance between two gene expression profiles in Phase 1 and a "global" probabilistic measure of interestingness of cluster patterns in Phase 2, (ii) it distinguishes between relevant and irrelevant expression values when performing re-clustering, and (iii) it makes explicit the patterns discovered in each cluster for possible interpretations. Experimental results show that the proposed algorithm can be an effective algorithm for discovering clusters in the presence of very noisy data. The patterns that are discovered in each cluster are found to be meaningful and statistically significant, and cannot otherwise be easily discovered. Based on these discovered patterns, genes co-expressed under the same experimental conditions and range of expression levels have been identified and evaluated. When identifying regulatory patterns at the promoter regions of the co-expressed genes, we also discovered well-known transcription factor binding sites in them. These binding sites can provide explanations for the co-expressed patterns.

Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 446 ◽  
Author(s):  
Shijie Xin ◽  
Xiaohui Wang ◽  
Guojun Dai ◽  
Jingjing Zhang ◽  
Tingting An ◽  
...  

The proinflammatory cytokine, interleukin-6 (IL-6), plays a critical role in many chronic inflammatory diseases, particularly inflammatory bowel disease. To investigate the regulation of IL-6 gene expression at the molecular level, genomic DNA sequencing of Jinghai yellow chickens (Gallus gallus) was performed to detect single-nucleotide polymorphisms (SNPs) in the region −2200 base pairs (bp) upstream to 500 bp downstream of IL-6. Transcription factor binding sites and CpG islands in the IL-6 promoter region were predicted using bioinformatics software. Twenty-eight SNP sites were identified in IL-6. Four of these 28 SNPs, three [−357 (G > A), −447 (C > G), and −663 (A > G)] in the 5′ regulatory region and one in the 3′ non-coding region [3177 (C > T)] are not labelled in GenBank. Bioinformatics analysis revealed 11 SNPs within the promoter region that altered putative transcription factor binding sites. Furthermore, the C-939G mutation in the promoter region may change the number of CpG islands, and SNPs in the 5′ regulatory region may influence IL-6 gene expression by altering transcription factor binding or CpG methylation status. Genetic diversity analysis revealed that the newly discovered A-663G site significantly deviated from Hardy-Weinberg equilibrium. These results provide a basis for further exploration of the promoter function of the IL-6 gene and the relationships of these SNPs to intestinal inflammation resistance in chickens.


2013 ◽  
Vol 33 (suppl_1) ◽  
Author(s):  
Nathan Airhart ◽  
John Curci

Background We have previously shown that VSMC from AAA are unique compared to cells from normal aorta (NAA) and carotid endarterectomy (CEA) with increased production of matrix metalloproteinases and elastin degrading activity. The purpose of this study was to explore the mechanisms behind this phenotype. Methods Tissue for VSMC cultures was obtained from patients undergoing AAA repair and CEA. NAA tissue was obtained from renal transplant patients (NAA). Total RNA was isolated from VSMC and subjected to whole-genome microarray. Enrichment of binding sites for transcription factors (TF) within 5 kD of transcription start sites of upregulated genes were identified using Whole Genome rVista. Enriched gene ontology terms were identified using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Results Gene profiles of 22 AAA, 29 CEA, and 17 NAA cell lines were compared. We identified 120 upregulated genes in AAA-VSMC relative to NAA and CEA-VSMC (FDR<0.05). Analysis of transcription factor binding sites of these genes showed enrichment of TFs including members of the ETS, AP-1, and Rel/Ankyrin families. Gene ontology (GO) revealed enrichment of developmental process and immune system genes. Analysis by cell compartment showed enrichment of extracellular matrix and intermediate filament cytoskeleton genes (Table 1). Conclusion This is the first study to demonstrates enrichment of TF families such as ETS, AP-1 and Rel-Ankyrin in AAA VSMC. This suggests that VSMC in AAA may not just be responding to inflammatory or other local stimuli, but may be directly contributing to the ECM changes that define AAA.


2017 ◽  
Author(s):  
Ella Preger-Ben Noon ◽  
Gonzalo Sabarís ◽  
Daniela Ortiz ◽  
Jonathan Sager ◽  
Anna Liebowitz ◽  
...  

AbstractDevelopmental genes can have complex c/s-regulatory regions, with multiple enhancers scattered across stretches of DNA spanning tens or hundreds of kilobases. Early work revealed remarkable modularity of enhancers, where distinct regions of DNA, bound by combinations of transcription factors, drive gene expression in defined spatio-temporal domains. Nevertheless, a few reports have shown that enhancer function may be required in multiple developmental stages, implying that regulatory elements can be pleiotropic. In these cases, it is not clear whether the pleiotropic enhancers employ the same transcription factor binding sites to drive expression at multiple developmental stages or whether enhancers function as chromatin scaffolds, where independent sets of transcription factor binding sites act at different stages. In this work we have studied the activity of the enhancers of the shavenbaby gene throughout D. melanogaster development. We found that all seven shavenbaby enhancers drive gene expression in multiple tissues and developmental stages at varying levels of redundancy. We have explored how this pleiotropy is encoded in two of these enhancers. In one enhancer, the same transcription factor binding sites contribute to embryonic and pupal expression, whereas for a second enhancer, these roles are largely encoded by distinct transcription factor binding sites. Our data suggest that enhancer pleiotropy might be a common feature of c/s-regulatory regions of developmental genes and that this pleiotropy can be encoded through multiple genetic architectures.


2013 ◽  
Vol 2013 ◽  
pp. 1-6
Author(s):  
Jia Song ◽  
Li Xu ◽  
Hong Sun

Identifying transcription factor binding sites with experimental methods is often expensive and time consuming. Although many computational approaches and tools have been developed for this problem, the prediction accuracy is not satisfactory. In this paper, we develop a new computational approach that can model the relationships among all short sequence segments in the promoter regions with a graph theoretic model. Based on this model, finding the locations of transcription factor binding site is reduced to computing maximum weighted cliques in a graph with weighted edges. We have implemented this approach and used it to predict the binding sites in two organisms,Caenorhabditis elegansandmus musculus. We compared the prediction accuracy with that of the Gibbs Motif Sampler. We found that the accuracy of our approach is higher than or comparable with that of the Gibbs Motif Sampler for most of tested data and can accurately identify binding sites in cases where the Gibbs Motif Sampler has difficulty to predict their locations.


BMC Genomics ◽  
2004 ◽  
Vol 5 (1) ◽  
Author(s):  
Vijayalakshmi H Nagaraj ◽  
Ruadhan A O'Flanagan ◽  
Adrian R Bruning ◽  
Jonathan R Mathias ◽  
Andrew K Vershon ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document