SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network

2021 ◽  
Author(s):  
Jian Hu ◽  
Xiangjie Li ◽  
Kyle Coleman ◽  
Amelia Schroeder ◽  
Nan Ma ◽  
...  
2020 ◽  
Author(s):  
Jian Hu ◽  
Xiangjie Li ◽  
Kyle Coleman ◽  
Amelia Schroeder ◽  
David J. Irwin ◽  
...  

AbstractRecent advances in spatial transcriptomics technologies have enabled comprehensive characterization of gene expression patterns in the context of tissue microenvironment. To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology in spatial transcriptomics data analysis. Through graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, which enables the identification of spatial domains with coherent expression and histology. The subsequent domain guided differential expression analysis then detects genes with enriched expression patterns in the identified domains. Analyzing five spatially resolved transcriptomics datasets using SpaGCN, we show it can detect genes with much more enriched spatial expression patterns than existing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally fast, making it a desirable tool for spatial transcriptomics studies.


2020 ◽  
Author(s):  
Mingyao Li ◽  
Jian Hu ◽  
Xiangjie Li ◽  
Kyle Coleman ◽  
Amelia Schroeder ◽  
...  

Abstract Recent advances in spatial transcriptomics technologies have enabled comprehensive characterization of gene expression patterns in the context of tissue microenvironment. To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology in spatial transcriptomics data analysis. Through graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, which enables the identification of spatial domains with coherent expression and histology. The subsequent domain guided differential expression analysis then detects genes with enriched expression patterns in the identified domains. Analyzing five spatially resolved transcriptomics datasets using SpaGCN, we show it can detect genes with much more enriched spatial expression patterns than existing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally fast, making it a desirable tool for spatial transcriptomics studies.


2021 ◽  
Author(s):  
Ivano Legnini ◽  
Lisa Emmenegger ◽  
Ricardo Wurmus ◽  
Alessandra Zappulo ◽  
Anna Oliveras Martinez ◽  
...  

AbstractQuantifying gene expression in space, for example by spatial transcriptomics, is essential for describing the biology of cells and their interactions in complex tissues. Perturbation experiments, at single-cell resolution and conditional on both space and time, are necessary for dissecting the molecular mechanisms of these interactions. To this aim, we combined optogenetics and CRISPR technologies to activate or knock-down RNA of target genes, at single-cell resolution and in programmable spatial patterns. As a proof of principle, we optogenetically induced Sonic Hedgehog (SHH) signaling at a distinct spatial location within human neural organoids. This robustly induced known SHH spatial domains of gene expression – cell-autonomously and across the entire organoid. In principle, our approach can be used to induce or knock down RNAs from any gene of interest in specific spatial locations or patterns of complex biological systems.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


Nature ◽  
1989 ◽  
Vol 340 (6232) ◽  
pp. 363-367 ◽  
Author(s):  
Wolfgang Driever ◽  
Gudrun Thoma ◽  
Christiane Nüsslein-Volhard

Development ◽  
2001 ◽  
Vol 128 (24) ◽  
pp. 4979-4991 ◽  
Author(s):  
James Y. H. Li ◽  
Alexandra L. Joyner

Otx2 and Gbx2 are among the earliest genes expressed in the neuroectoderm, dividing it into anterior and posterior domains with a common border that marks the mid-hindbrain junction. Otx2 is required for development of the forebrain and midbrain, and Gbx2 for the anterior hindbrain. Furthermore, opposing interactions between Otx2 and Gbx2 play an important role in positioning the mid-hindbrain boundary, where an organizer forms that regulates midbrain and cerebellum development. We show that the expression domains of Otx2 and Gbx2 are initially established independently of each other at the early headfold stage, and then their expression rapidly becomes interdependent by the late headfold stage. As we demonstrate that the repression of Otx2 by retinoic acid is dependent on an induction of Gbx2 in the anterior brain, molecules other than retinoic acid must regulate the initial expression of Otx2 in vivo. In contrast to previous suggestions that an interaction between Otx2- and Gbx2-expressing cells may be essential for induction of mid-hindbrain organizer factors such as Fgf8, we find that Fgf8 and other essential mid-hindbrain genes are induced in a correct temporal manner in mouse embryos deficient for both Otx2 and Gbx2. However, expression of these genes is abnormally co-localized in a broad anterior region of the neuroectoderm. Finally, we find that by removing Otx2 function, development of rhombomere 3 is rescued in Gbx2–/– embryos, showing that Gbx2 plays a permissive, not instructive, role in rhombomere 3 development. Our results provide new insights into induction and maintenance of the mid-hindbrain genetic cascade by showing that a mid-hindbrain competence region is initially established independent of the division of the neuroectoderm into an anterior Otx2-positive domain and posterior Gbx2-positive domain. Furthermore, Otx2 and Gbx2 are required to suppress hindbrain and midbrain development, respectively, and thus allow establishment of the normal spatial domains of Fgf8 and other genes.


2020 ◽  
Author(s):  
Minsheng Hao ◽  
Kui Hua ◽  
Xuegong Zhang

AbstractRecent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data.We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses selforganizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian Process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ~5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde.


2019 ◽  
Author(s):  
Wei Wang ◽  
Gang Ren ◽  
Ni Hong ◽  
Wenfei Jin

Abstract Background: CCCTC-Binding Factor (CTCF), also known as 11-zinc finger protein, participates in many cellular processes, including insulator activity, transcriptional regulation and organization of chromatin architecture. Based on single cell flow cytometry and single cell RNA-FISH analyses, our previous study showed that deletion of CTCF binding site led to a significantly increase of cellular variation of its target gene. However, the effect of CTCF on genome-wide landscape of cell-to-cell variation is unclear. Results: We knocked down CTCF in EL4 cells using shRNA, and conducted single cell RNA-seq on both wild type (WT) cells and CTCF-Knockdown (CTCF-KD) cells using Fluidigm C1 system. Principal component analysis of single cell RNA-seq data showed that WT and CTCF-KD cells concentrated in two different clusters on PC1, indicating gene expression profiles of WT and CTCF-KD cells were systematically different. Interestingly, GO terms including regulation of transcription, DNA binding, Zinc finger and transcription factor binding were significantly enriched in CTCF-KD-specific highly variable genes, indicating tissue-specific genes such as transcription factors were highly sensitive to CTCF level. The dysregulation of transcription factors potentially explain why knockdown of CTCF lead to systematic change of gene expression. In contrast, housekeeping genes such as rRNA processing, DNA repair and tRNA processing were significantly enriched in WT-specific highly variable genes, potentially due to a higher cellular variation of cell activity in WT cells compared to CTCF-KD cells. We further found cellular variation-increased genes were significantly enriched in down-regulated genes, indicating CTCF knockdown simultaneously reduced the expression levels and increased the expression noise of its regulated genes. Conclusions: To our knowledge, this is the first attempt to explore genome-wide landscape of cellular variation after CTCF knockdown. Our study not only advances our understanding of CTCF function in maintaining gene expression and reducing expression noise, but also provides a framework for examining gene function.


Author(s):  
Justin Lakkis ◽  
David Wang ◽  
Yuanchao Zhang ◽  
Gang Hu ◽  
Kui Wang ◽  
...  

AbstractRecent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effect, which is inevitable in studies involving human tissues. Most existing methods remove batch effect in a low-dimensional embedding space. Although useful for clustering, batch effect is still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effect. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Popular methods such as Seurat3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effect in the gene expression space, but MNN can only analyze two batches at a time and it becomes computationally infeasible when the number of batches is large. Here we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data, while correcting batch effect both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC consistently outperforms scVI, DCA, and MNN. With CarDEC denoising, those non-highly variable genes offer as much signal for clustering as the highly variable genes, suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC’s denoised and batch corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effect. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.


2021 ◽  
Author(s):  
Kangning Dong ◽  
Shihua Zhang

Recent advances in spatially resolved transcriptomics have enabled comprehensive measurements of gene expression patterns while retaining spatial context of tissue microenvironment. Deciphering the spatial context of spots in a tissue needs to use their spatial information carefully. To this end, we developed a graph attention auto- encoder framework STGATE to accurately identify spatial domains by learning low-dimensional latent embeddings via integrating spatial information and gene expression profiles. To better characterize the spatial similarity at the boundary of spatial domains, STGATE adopts an attention mechanism to adaptively learn the similarity of neighboring spots, and an optional cell type-aware module through integrating the pre-clustering of gene expressions. We validated STGATE on diverse spatial transcriptomics datasets generated by different platforms with different spatial resolutions. STGATE could substantially improve the identification accuracy of spatial domains, and denoise the data while preserving spatial expression patterns. Importantly, STGATE could be extended to multiple consecutive sections for reducing batch effects between sections and extracting 3D expression domains from the reconstructed 3D tissue effectively.


Sign in / Sign up

Export Citation Format

Share Document