scholarly journals Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo

F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 124
Author(s):  
Yang Chen ◽  
Disheng Mao ◽  
Yuping Zhang ◽  
Zhengqing Ouyang

Analyzing single cell RNA-seq data is important for deciphering the spatial relationships, expression patterns, and developmental processes of cells. Combining in situ hybridization-based gene expression atlas images, some works have successfully recovered spatial locations of cells in zebrafish and Drosophila embryos. In this article, we describe a highly ranked method in the DREAM Single Cell Transcriptomics Challenge for predicting cell positions in the Drosophila embryo. The method performs unsupervised feature extraction to select a small number of driver genes and then uses them to predict gene expression and spatial position of each individual cell. First, hierarchical clustering is used to select a subset of driver genes. Second, the similarity matrix of single cells in the bins of the reference atlas is computed. Based on the similarity matrix, the spatial positions of cells are then determined by hierarchical clustering. This method is evaluated on the cell positions and gene expressions in the DREAM Single Cell Transcriptomics Challenge. The comparison with the “silver standard” suggests that our method is effective in reconstructing the cell spatial positions and gene expression patterns in tissues.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1014
Author(s):  
Maryam Zand ◽  
Jianhua Ruan

The advancement in single-cell RNA sequencing technologies allow us to obtain transcriptome at single cell resolution. However, the original spatial context of cells, a crucial knowledge for understanding cellular and tissue-level functions, is often lost during sequencing. To address this issue, the DREAM Single Cell Transcriptomics Challenge launched a community-wide effort to seek computational solutions for spatial mapping of single cells in tissues using single-cell RNAseq (scRNA-seq) data and a reference atlas obtained from in situ hybridization data. As a top-performing team in this competition, we approach this problem in three steps. The first step involves identifying a set of most informative genes based on the consistency between gene expression similarity and cell proximity. For this step, we propose two different approaches, i.e., an unsupervised approach that does not utilize the gold standard location of the cells provided by the challenge organizers, and a supervised approach that relies on the gold standard locations. In the second step, a Particle Swarm Optimization algorithm is used to optimize the weights of different genes in order to maximize matches between the predicted locations and the gold standard locations. Finally, the information embedded in the cell topology is used to improve the predicted cell-location scores by weighted averaging of scores from neighboring locations. Evaluation results based on DREAM scores show that our method accurately predicts the location of single cells, and the predictions lead to successful recovery of the spatial expression patterns for most of landmark genes. In addition, investigating the selected genes demonstrates that most predictive genes are cluster specific, and stable across our supervised and unsupervised gene selection frameworks. Overall, the promising results obtained by our methods in DREAM challenge demonstrated that topological consistency is a useful concept in identifying marker genes and constructing predictive models for spatial mapping of single cells.


F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 1014
Author(s):  
Maryam Zand ◽  
Jianhua Ruan

The advancement in single-cell RNA sequencing technologies allow us to obtain transcriptome at single cell resolution. However, the original spatial context of cells, a crucial knowledge for understanding cellular and tissue-level functions, is often lost during sequencing. To address this issue, the DREAM Single Cell Transcriptomics Challenge launched a community-wide effort to seek computational solutions for spatial mapping of single cells in tissues using single-cell RNAseq (scRNA-seq) data and a reference atlas obtained from in situ hybridization data. As a top-performing team in this competition, we approach this problem in three steps. The first step involves identifying a set of most informative genes based on the consistency between gene expression similarity and cell proximity. For this step, we propose two different approaches, i.e., an unsupervised approach that does not utilize the gold standard location of the cells provided by the challenge organizers, and a supervised approach that relies on the gold standard locations. In the second step, a Particle Swarm Optimization algorithm is used to optimize the weights of different genes in order to maximize matches between the predicted locations and the gold standard locations. Finally, the information embedded in the cell topology is used to improve the predicted cell-location scores by weighted averaging of scores from neighboring locations. Evaluation results based on DREAM scores show that our method accurately predicts the location of single cells, and the predictions lead to successful recovery of the spatial expression patterns for most of landmark genes. In addition, investigating the selected genes demonstrates that most predictive genes are cluster specific, and stable across our supervised and unsupervised gene selection frameworks. Overall, the promising results obtained by our methods in DREAM challenge demonstrated that topological consistency is a useful concept in identifying marker genes and constructing predictive models for spatial mapping of single cells.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Seon-Jin Yoon ◽  
Hye Young Son ◽  
Jin-Kyoung Shim ◽  
Ju Hyung Moon ◽  
Eui-Hyun Kim ◽  
...  

Abstract Background Driver genes of GBM may be crucial for the onset of isocitrate dehydrogenase (IDH)-wildtype (WT) glioblastoma (GBM). However, it is still unknown whether the genes are expressed in the identical cluster of cells. Here, we have examined the gene expression patterns of GBM tissues and patient-derived tumorspheres (TSs) and aimed to find a progression-related gene. Methods We retrospectively collected primary IDH-WT GBM tissue samples (n = 58) and tumor-free cortical tissue samples (control, n = 20). TSs are isolated from the IDH-WT GBM tissue with B27 neurobasal medium. Associations among the driver genes were explored in the bulk tissue, bulk cell, and a single cell RNAsequencing techniques (scRNAseq) considering the alteration status of TP53, PTEN, EGFR, and TERT promoter as well as MGMT promoter methylation. Transcriptomic perturbation by temozolomide (TMZ) was examined in the two TSs. Results We comprehensively compared the gene expression of the known driver genes as well as MGMT, PTPRZ1, or IDH1. Bulk RNAseq databases of the primary GBM tissue revealed a significant association between TERT and TP53 (p < 0.001, R = 0.28) and its association increased in the recurrent tumor (p  < 0.001, R = 0.86). TSs reflected the tissue-level patterns of association between the two genes (p < 0.01, R = 0.59, n = 20). A scRNAseq data of a TS revealed the TERT and TP53 expressing cells are in a same single cell cluster. The driver-enriched cluster dominantly expressed the glioma-associated long noncoding RNAs. Most of the driver-associated genes were downregulated after TMZ except IGFBP5. Conclusions GBM tissue level expression patterns of EGFR, TERT, PTEN, IDH1, PTPRZ1, and MGMT are observed in the GBM TSs. The driver gene-associated cluster of the GBM single cells were enriched with the glioma-associated long noncoding RNAs.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 124
Author(s):  
Yang Chen ◽  
Disheng Mao ◽  
Yuping Zhang ◽  
Zhengqing Ouyang

Single cell RNA sequencing (scRNA-seq) data analysis is important for building a global transcription landscape of all cell types in tissues, tracing cell lineages, and reconstructing cell spatial organizations. In this article, we propose an unsupervised learning method to predict spatial positions and gene expression of individual cells in Drosophila embryos using a small number of driver genes. Specifically, we develop a two-stage clustering approach, and compute a probability matrix of the spatial positions of single cells. This method is applied to dataset in the DREAM Single Cell Transcriptomics Challenge. The comparison with the “gold standard” suggests that our method is effective in reconstructing the cell positions and gene expression patterns in spatial tissues.


Author(s):  
Kenneth H. Hu ◽  
John P. Eichorst ◽  
Chris S. McGinnis ◽  
David M. Patterson ◽  
Eric D. Chow ◽  
...  

ABSTRACTSpatial transcriptomics seeks to integrate single-cell transcriptomic data within the 3-dimensional space of multicellular biology. Current methods use glass substrates pre-seeded with matrices of barcodes or fluorescence hybridization of a limited number of probes. We developed an alternative approach, called ‘ZipSeq’, that uses patterned illumination and photocaged oligonucleotides to serially print barcodes (Zipcodes) onto live cells within intact tissues, in real-time and with on-the-fly selection of patterns. Using ZipSeq, we mapped gene expression in three settings: in-vitro wound healing, live lymph node sections and in a live tumor microenvironment (TME). In all cases, we discovered new gene expression patterns associated with histological structures. In the TME, this demonstrated a trajectory of myeloid and T cell differentiation, from periphery inward. A variation of ZipSeq efficiently scales to the level of single cells, providing a pathway for complete mapping of live tissues, subsequent to real-time imaging or perturbation.


2021 ◽  
Author(s):  
Chaohao Gu ◽  
Zhandong Liu

Abstract Spatial gene-expression is a crucial determinant of cell fate and behavior. Recent imaging and sequencing-technology advancements have enabled scientists to develop new tools that use spatial information to measure gene-expression at close to single-cell levels. Yet, while Fluorescence In-situ Hybridization (FISH) can quantify transcript numbers at single-cell resolution, it is limited to a small number of genes. Similarly, slide-seq was designed to measure spatial-expression profiles at the single-cell level but has a relatively low gene-capture rate. And although single-cell RNA-seq enables deep cellular gene-expression profiling, it loses spatial information during sample-collection. These major limitations have stymied these methods’ broader application in the field. To overcome spatio-omics technology’s limitations and better understand spatial patterns at single-cell resolution, we designed a computation algorithm that uses glmSMA to predict cell locations by integrating scRNA-seq data with a spatial-omics reference atlas. We treated cell-mapping as a convex optimization problem by minimizing the differences between cellular-expression profiles and location-expression profiles with an L1 regularization and graph Laplacian based L2 regularization to ensure a sparse and smooth mapping. We validated the mapping results by reconstructing spatial- expression patterns of well-known marker genes in complex tissues, like the mouse cerebellum and hippocampus. We used the biological literature to verify that the reconstructed patterns can recapitulate cell-type and anatomy structures. Our work thus far shows that, together, we can use glmSMA to accurately assign single cells to their original reference-atlas locations.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 183-183
Author(s):  
Kai Wu ◽  
Qianyi Ma ◽  
Darren King ◽  
Jun Li ◽  
Sami Malek

Introduction: Despite achievement of complete remission (CR) following chemotherapy, Acute Myelogenous Leukemia (AML) relapses in the majority of adult patients. While relapsed AML is almost always clonally related to the disease at diagnosis, the actual molecular and cellular contributors to chemotherapy resistance and to AML relapse remain incompletely understood. Some molecular determinants of relapse have been identified in genomic, epigenetic and proteomic aberrations, while cellular relapse reservoirs have been identified in leukemia stem cells as well as in more mature leukemic cell compartments. Here, we set out to determine the cellular composition, gene mutation status and gene expression of paired AML specimens procured at diagnosis and at relapse aiming at a better understanding of the AML relapse process. Methods: We employed the drop-seq 3' single cell RNA sequencing (scRNA-seq) method (Macosko 2015) with minor modifications to analyze the mRNA expression in single cells derived from 12 paired AML specimens procured at diagnosis and at relapse from prior CR. We obtained scRNA-seq data on 1000-2000 single cells per sample detecting approximately 2000-3000 unique molecular identifiers (UMIs) and 800-1500 genes per cell. Using WES or panel-based sequencing we determined mutations in known driver genes. Subsequently, we optimized novel methods for detection and mapping of mutated driver genes to individual cells using mutation specific PCR conditions and novel bioinformatics approaches. We annotated scRNA-seq expression profiles of the diagnosis and relapsed AML specimens individually using publicly available data for cell type-specific RNA markers derived from sorted normal cell populations and further compared the scRNA-seq data to scRNA-seq data of 5 pooled normal human bone marrows generated for this study. Results: Through analyses of scRNA-seq data of paired diagnosis and relapse AML specimens via principle components analyses (PCA) or t-distributed stochastic neighbor embedding (t-SNE) we detected varying degrees of separation of cell clusters in all cases analyzed indicative of substantial changes in single cell gene expression between AML diagnosis and relapse. A few of these observed cluster shifts were paralleled by gain or loss of mutated genes (e.g. FLT3-ITD) at relapse while most others lacked obvious clonal genomic markers. Through subsequent comparison of the expression similarities of single AML cells to sorted normal human bone marrow cells we detected two distinct AML relapse patterns: i) a pattern of relapse suggesting simple leukemia regrowth as evidenced by similar proportions of leukemia cells mapping onto discrete normal bone marrow cells (e.g. monocyte-like or GMPs or CMPs), and, ii) a pattern of relapse whereby the gene expression of relapsed cells (but not diagnosis cells) had similarity to normal hematopoietic cells that are conventionally placed more apical in the classical hematopoiesis differentiation cascade (HSCs, MPPs, CMPs; a phenotypic shift to immaturity). In addition, no leukemia sample mapped to just one classically defined bone marrow cell type but instead to multiple cell types, suggesting that most AML leukemia cells harbor aberrant hybrid cell gene expression patterns. Finally, we detected quantitative shifts in T cells and NK cells in some samples at relapse, which will be analyzed in greater detail. Conclusions: The comparative analysis of scRNA-seq data of paired AML specimens procured at diagnosis and relapse, identifies frequent and previously unrecognized changes in gene expression in leukemia cells at relapse. Through a comparison of gene mutation and gene expression at single cell resolution we identify two distinct AML relapse patterns in adult AML. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Fang Ye ◽  
Guodong Zhang ◽  
Weigao E ◽  
Haide Chen ◽  
Chengxuan Yu ◽  
...  

Abstract The Mexican axolotl (Ambystoma mexicanum) is a promising tetrapod model for regeneration and developmental studies. Remarkably, neotenic axolotls may undergo metamorphosis, during which their regeneration capacity and lifespan gradually decline. However, a system-level single-cell analysis of molecular characteristics in neotenic and metamorphosed axolotls is still lacking. Here, we developed a single-cell RNA-seq method based on combinatorial hybridization to generate a tissue-based transcriptomic atlas of the adult axolotl. We performed gene expression profiling of over 1 million single cells across 19 tissues to construct the first adult axolotl cell atlas. Comparison of single-cell transcriptomes between the tissues of neotenic and metamorphosed axolotls revealed the heterogeneity of structural cells in different tissues and established their regulatory network. Furthermore, we described dynamic gene expression patterns during limb development in neotenic axolotls. These data serve as a resource to explore the molecular identity of the axolotl as well as its metamorphosis.


Genes ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 28
Author(s):  
Shruti Gupta ◽  
Ajay Kumar Verma ◽  
Shandar Ahmad

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.


2021 ◽  
Author(s):  
Chenxu Zhu ◽  
Yanxiao Zhang ◽  
Yang Eric Li ◽  
Jacinta Lucero ◽  
M. Margarita Behrens ◽  
...  

Abstract We describe here Paired-Tag, a high-throughput multi-omics method for joint profiling of histone modifications and gene expressions in single cells. The assay is based on a combinatorial barcoding indexing strategy that does not require special instruments. It can be performed with nuclei extracted from cultured cells or frozen tissues, in standard molecular biology laboratories.


Sign in / Sign up

Export Citation Format

Share Document