Identifying time-lagged gene clusters using gene expression data

AbstractMany computational methods have been developed for inferring causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships, with many of these inferred results supported by the literature.Author summaryComputational causal inference is a promising way to survey causal relationships between genes efficiently. Though many causal inference methods have been applied to gene expression data, none considers the time-lagged causal relationship, which means that some genes may take some time to affect their target genes with several reactions. If relationships between genes are time-lagged, the existing methods’ assumptions will be violated. The relationships will be challenging to recognize. We demonstrate that this is indeed the case through simulation. Therefore, we develop a method for inferring time-lagged causal relationships of single-cell gene expression data. We assume that a time-lagged causal relationship should present a strong association between the cause and the effect’s changing. To calculate such correlation, we first estimate the derivative of gene expression using the information from unspliced transcripts. Then, we use conditional independent tests to search gene pairs satisfying our assumption. Our results suggest that we could accurately infer time-lagged causal gene pairs validated by published literature. This method may complement gene regulatory analysis and provide candidate gene pairs for further controlled experiments.

Download Full-text

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Genome Biology ◽

10.1186/s13059-018-1536-8 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 28

Author(s):

Basel Abu-Jamous ◽

Steven Kelly

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clusters ◽

Automatic Extraction ◽

Expression Data

Download Full-text

INTERRELATED TWO-WAY CLUSTERING AND ITS APPLICATION ON GENE EXPRESSION DATA

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002272 ◽

2005 ◽

Vol 14 (04) ◽

pp. 577-597 ◽

Cited By ~ 6

Author(s):

CHUN TANG ◽

AIDONG ZHANG

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Domain Knowledge ◽

Gene Clusters ◽

Data Sets ◽

Messenger Rnas ◽

Expression Data ◽

Large Numbers ◽

Clustering Approach ◽

Mrna Expression Profiling

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.

Download Full-text

Mining gene expression data for positive and negative co-regulated gene clusters

Bioinformatics ◽

10.1093/bioinformatics/bth312 ◽

2004 ◽

Vol 20 (16) ◽

pp. 2711-2718 ◽

Cited By ~ 34

Author(s):

L. Ji ◽

K.-L. Tan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clusters ◽

Expression Data

Download Full-text

Assessing reliability of gene clusters from gene expression data

Functional & Integrative Genomics ◽

10.1007/s101420000019 ◽

2000 ◽

Vol 1 (3) ◽

pp. 156-173 ◽

Cited By ~ 41

Author(s):

Kui Zhang ◽

Hongyu Zhao

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clusters ◽

Expression Data

Download Full-text

Adversarial generation of gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btab035 ◽

2021 ◽

Author(s):

Ramon Viñas ◽

Helena Andrés-Terré ◽

Pietro Liò ◽

Kevin Bryson

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Synthetic Data ◽

Gene Clusters ◽

Supplementary Information ◽

Expression Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Wide Range ◽

Transcriptomics Data

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cluster Overlap Distribution Map: Visualization for Gene Expression Analysis Using Immersive Projection Technology

Presence Teleoperators & Virtual Environments ◽

10.1162/105474603763835369 ◽

2003 ◽

Vol 12 (1) ◽

pp. 96-109 ◽

Cited By ~ 2

Author(s):

Makoto Kano ◽

Kunihiro Nishimura ◽

Shuichi Tsutsumi ◽

Hiroyuki Aburatani ◽

Koichi Hirota ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Web Sites ◽

Pairwise Comparison ◽

Gene Clusters ◽

Data Sets ◽

Expression Data ◽

Distribution Map ◽

Large Gene ◽

Multiple Variables

In this paper, we discuss possible applications of virtual reality technologies, such as immersive projection technology (IPT), in the field of genome science, and propose cluster-oriented visualization that attaches importance to data separation of large gene data sets with multiple variables. Based on these strategies, we developed the cluster overlap distribution map (CDCM), which is a visualization methodology using IPT for pairwise comparison between cluster sets generated from different gene expression data sets. This methodology effectively provides the user with indications of gene clusters that are worth a close examination. In addition, by using the plate window manager system, which enables the user to manipulate existing 2D GUI applications in the virtual 3D space, we developed the virtual environment for the comprehensive analysis from providing the indications to further examination by referring to the database on Web sites. Our system was applied in the comparison between the gene expression data sets of hepatocellular carcinomas and hepatoblastomas, and the effectiveness of the system was confirmed.

Download Full-text

Linking genotype to phenotype in multi-omics data of small sample

BMC Genomics ◽

10.1186/s12864-021-07867-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xinpeng Guo ◽

Yafei Song ◽

Shuhui Liu ◽

Meihong Gao ◽

Yang Qi ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Clusters ◽

Small Sample ◽

Large Sample Size ◽

Genome Wide Association Studies ◽

Omics Data ◽

Expression Data ◽

Large Sample ◽

Sample Set

Abstract Background Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. Results To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. Conclusions We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.

Download Full-text

Mining heterogeneous gene expression data with time lagged recurrent neural networks

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02 ◽

10.1145/775047.775106 ◽

2002 ◽

Cited By ~ 2

Author(s):

Yulan Liang ◽

Arpad Kelemen

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Gene Expression Data ◽

Recurrent Neural Networks ◽

Expression Data ◽

Time Lagged

Download Full-text