scholarly journals Mining gene expression data for positive and negative co-regulated gene clusters

2004 ◽  
Vol 20 (16) ◽  
pp. 2711-2718 ◽  
Author(s):  
L. Ji ◽  
K.-L. Tan
2005 ◽  
Vol 14 (04) ◽  
pp. 577-597 ◽  
Author(s):  
CHUN TANG ◽  
AIDONG ZHANG

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.


Author(s):  
Ramon Viñas ◽  
Helena Andrés-Terré ◽  
Pietro Liò ◽  
Kevin Bryson

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 12 (1) ◽  
pp. 96-109 ◽  
Author(s):  
Makoto Kano ◽  
Kunihiro Nishimura ◽  
Shuichi Tsutsumi ◽  
Hiroyuki Aburatani ◽  
Koichi Hirota ◽  
...  

In this paper, we discuss possible applications of virtual reality technologies, such as immersive projection technology (IPT), in the field of genome science, and propose cluster-oriented visualization that attaches importance to data separation of large gene data sets with multiple variables. Based on these strategies, we developed the cluster overlap distribution map (CDCM), which is a visualization methodology using IPT for pairwise comparison between cluster sets generated from different gene expression data sets. This methodology effectively provides the user with indications of gene clusters that are worth a close examination. In addition, by using the plate window manager system, which enables the user to manipulate existing 2D GUI applications in the virtual 3D space, we developed the virtual environment for the comprehensive analysis from providing the indications to further examination by referring to the database on Web sites. Our system was applied in the comparison between the gene expression data sets of hepatocellular carcinomas and hepatoblastomas, and the effectiveness of the system was confirmed.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinpeng Guo ◽  
Yafei Song ◽  
Shuhui Liu ◽  
Meihong Gao ◽  
Yang Qi ◽  
...  

Abstract Background Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. Results To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. Conclusions We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.


2017 ◽  
Author(s):  
Basel Abu-Jamous ◽  
Steven Kelly

AbstractIdentification of co-expressed gene clusters can provide evidence for genetic or physical interactions between genes. Thus, co-expression clustering is a routine step in large-scale analyses of gene expression data. We show that commonly used clustering methods produce results that substantially disagree with each other, and do not match the biological expectations of co-expressed gene clusters. Furthermore, these clusters can contain up to 50% unreliably assigned genes. Consequently, downstream analyses of these clusters (e.g. functional term enrichment analysis) suffer from high error rates. We present clust, an automated method that solves these problems by extracting clusters that match the biological expectations of co-expressed genes. Using 100 datasets from five model organisms we demonstrate that clusters generated by clust are better than those produced by other methods, both numerically and for use in functional analysis. Finally, we show that clust can simultaneously cluster multiple datasets, enabling users to leverage the large quantity of public expression data for novel comparative analysis.


2013 ◽  
Vol 61 (1) ◽  
Author(s):  
Lim Fong Tee ◽  
Mohd Saberi Mohamad ◽  
Safaai Deris ◽  
Ahmad ‘Athif Mohd Faudzi ◽  
Muhammad Shafie Abd Latiff ◽  
...  

Hierarchical clustering is an unsupervised technique, which is a common approach to study protein and gene expression data. In clustering, the patterns of expression of different genes are grouped into distinct clusters, in which the genes in the same cluster are assumed potential to be functionally related or to be influenced by a common upstream factor. Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, the uncertainty in the results obtained is still bothersome. Experimental repetitions are generally performed to overcome the drawbacks of biological variability and technical variability. In this study, the author proposes repeated measurement to evaluate the stability of gene clusters. This paper aims to prove that the stability from the gene clusters, incorporated with repeated measurement, can be used for further analysis.


Sign in / Sign up

Export Citation Format

Share Document