Identifying time-lagged gene clusters using gene expression data

2004 ◽  
Vol 21 (4) ◽  
pp. 509-516 ◽  
Author(s):  
L. Ji ◽  
K.-L. Tan
2021 ◽  
Author(s):  
Huan-Huan Wei ◽  
Hui Lu ◽  
Hongyu Zhao

AbstractMany computational methods have been developed for inferring causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships, with many of these inferred results supported by the literature.Author summaryComputational causal inference is a promising way to survey causal relationships between genes efficiently. Though many causal inference methods have been applied to gene expression data, none considers the time-lagged causal relationship, which means that some genes may take some time to affect their target genes with several reactions. If relationships between genes are time-lagged, the existing methods’ assumptions will be violated. The relationships will be challenging to recognize. We demonstrate that this is indeed the case through simulation. Therefore, we develop a method for inferring time-lagged causal relationships of single-cell gene expression data. We assume that a time-lagged causal relationship should present a strong association between the cause and the effect’s changing. To calculate such correlation, we first estimate the derivative of gene expression using the information from unspliced transcripts. Then, we use conditional independent tests to search gene pairs satisfying our assumption. Our results suggest that we could accurately infer time-lagged causal gene pairs validated by published literature. This method may complement gene regulatory analysis and provide candidate gene pairs for further controlled experiments.


2005 ◽  
Vol 14 (04) ◽  
pp. 577-597 ◽  
Author(s):  
CHUN TANG ◽  
AIDONG ZHANG

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.


Author(s):  
Ramon Viñas ◽  
Helena Andrés-Terré ◽  
Pietro Liò ◽  
Kevin Bryson

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 12 (1) ◽  
pp. 96-109 ◽  
Author(s):  
Makoto Kano ◽  
Kunihiro Nishimura ◽  
Shuichi Tsutsumi ◽  
Hiroyuki Aburatani ◽  
Koichi Hirota ◽  
...  

In this paper, we discuss possible applications of virtual reality technologies, such as immersive projection technology (IPT), in the field of genome science, and propose cluster-oriented visualization that attaches importance to data separation of large gene data sets with multiple variables. Based on these strategies, we developed the cluster overlap distribution map (CDCM), which is a visualization methodology using IPT for pairwise comparison between cluster sets generated from different gene expression data sets. This methodology effectively provides the user with indications of gene clusters that are worth a close examination. In addition, by using the plate window manager system, which enables the user to manipulate existing 2D GUI applications in the virtual 3D space, we developed the virtual environment for the comprehensive analysis from providing the indications to further examination by referring to the database on Web sites. Our system was applied in the comparison between the gene expression data sets of hepatocellular carcinomas and hepatoblastomas, and the effectiveness of the system was confirmed.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinpeng Guo ◽  
Yafei Song ◽  
Shuhui Liu ◽  
Meihong Gao ◽  
Yang Qi ◽  
...  

Abstract Background Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. Results To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. Conclusions We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.


Sign in / Sign up

Export Citation Format

Share Document