scholarly journals ScLRTC: imputation for single-cell RNA-seq data via low-rank tensor completion

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiutao Pan ◽  
Zhong Li ◽  
Shengwei Qin ◽  
Minzhe Yu ◽  
Hang Hu

Abstract Background With single-cell RNA sequencing (scRNA-seq) methods, gene expression patterns at the single-cell resolution can be revealed. But as impacted by current technical defects, dropout events in scRNA-seq lead to missing data and noise in the gene-cell expression matrix and adversely affect downstream analyses. Accordingly, the true gene expression level should be recovered before the downstream analysis is carried out. Results In this paper, a novel low-rank tensor completion-based method, termed as scLRTC, is proposed to impute the dropout entries of a given scRNA-seq expression. It initially exploits the similarity of single cells to build a third-order low-rank tensor and employs the tensor decomposition to denoise the data. Subsequently, it reconstructs the cell expression by adopting the low-rank tensor completion algorithm, which can restore the gene-to-gene and cell-to-cell correlations. ScLRTC is compared with other state-of-the-art methods on simulated datasets and real scRNA-seq datasets with different data sizes. Specific to simulated datasets, scLRTC outperforms other methods in imputing the dropouts closest to the original expression values, which is assessed by both the sum of squared error (SSE) and Pearson correlation coefficient (PCC). In terms of real datasets, scLRTC achieves the most accurate cell classification results in spite of the choice of different clustering methods (e.g., SC3 or t-SNE followed by K-means), which is evaluated by using adjusted rand index (ARI) and normalized mutual information (NMI). Lastly, scLRTC is demonstrated to be also effective in cell visualization and in inferring cell lineage trajectories. Conclusions a novel low-rank tensor completion-based method scLRTC gave imputation results better than the state-of-the-art tools. Source code of scLRTC can be accessed at https://github.com/jianghuaijie/scLRTC.

2021 ◽  
Author(s):  
Zhong Li ◽  
Xiutao Pan ◽  
Shengwei Qin ◽  
Minzhe Yu ◽  
Hang Hu

Abstract Background: With single-cell RNA sequencing (scRNA-seq) methods, gene expression patterns at the single-cell resolution can be revealed. But as impacted by current technical defects, dropout events in scRNA-seq lead to missing data and noise in the gene-cell expression matrix and adversely affect downstream analyses. Accordingly, the true gene expression level should be recovered before the downstream analysis is carried out. Results: In this paper, a novel low-rank tensor completion-based method, termed as scLRTC, is proposed to impute the dropout entries of a given scRNA-seq expression. It initially exploits the similarity of single cells to build a third-order low-rank tensor and employs the tensor decomposition to denoise the data. Subsequently, it reconstructs the cell expression by adopting the low-rank tensor completion algorithm, which can restore the gene-to-gene and cell-to-cell correlations. ScLRTC is compared with other state-of-the-art methods on simulated datasets and real scRNA-seq datasets with different data sizes. Specific to simulated datasets, scLRTC outperforms other methods in imputing the dropouts closest to the original expression values, which is assessed by both the sum of squared error (SSE) and Pearson correlation coefficient (PCC). In terms of real datasets, scLRTC achieves the most accurate cell classification results in spite of the choice of different clustering methods (e.g., SC3 or t-SNE followed by K-means), which is evaluated by using adjusted rand index (ARI) and normalized mutual information (NMI). Lastly, scLRTC is demonstrated to be also effective in cell visualization and in inferring cell lineage trajectories.Conclusions: a novel low-rank tensor completion-based method scLRTC gave imputation results better than the state-of-the-art tools. Source code of scLRTC can be accessed at https://github.com/jianghuaijie/scLRTC.


Author(s):  
Kenneth H. Hu ◽  
John P. Eichorst ◽  
Chris S. McGinnis ◽  
David M. Patterson ◽  
Eric D. Chow ◽  
...  

ABSTRACTSpatial transcriptomics seeks to integrate single-cell transcriptomic data within the 3-dimensional space of multicellular biology. Current methods use glass substrates pre-seeded with matrices of barcodes or fluorescence hybridization of a limited number of probes. We developed an alternative approach, called ‘ZipSeq’, that uses patterned illumination and photocaged oligonucleotides to serially print barcodes (Zipcodes) onto live cells within intact tissues, in real-time and with on-the-fly selection of patterns. Using ZipSeq, we mapped gene expression in three settings: in-vitro wound healing, live lymph node sections and in a live tumor microenvironment (TME). In all cases, we discovered new gene expression patterns associated with histological structures. In the TME, this demonstrated a trajectory of myeloid and T cell differentiation, from periphery inward. A variation of ZipSeq efficiently scales to the level of single cells, providing a pathway for complete mapping of live tissues, subsequent to real-time imaging or perturbation.


Yeast ◽  
2000 ◽  
Vol 1 (3) ◽  
pp. 211-217 ◽  
Author(s):  
Gerard Brady

Increasingly mRNA expression patterns established using a variety of molecular technologies such as cDNA microarrays, SAGE and cDNA display are being used to identify potential regulatory genes and as a means of providing valuable insights into the biological status of the starting sample. Until recently, the application of these techniques has been limited to mRNA isolated from millions or, at very best, several thousand cells thereby restricting the study of small samples and complex tissues. To overcome this limitation a variety of amplification approaches have been developed which are capable of broadly evaluating mRNA expression patterns in single cells. This review will describe approaches that have been employed to examine global gene expression patterns either in small numbers of cells or, wherever possible, in actual isolated single cells. The first half of the review will summarize the technical aspects of methods developed for single-cell analysis and the latter half of the review will describe the areas of biological research that have benefited from single-cell expression analysis.


2021 ◽  
Author(s):  
Chaohao Gu ◽  
Zhandong Liu

Abstract Spatial gene-expression is a crucial determinant of cell fate and behavior. Recent imaging and sequencing-technology advancements have enabled scientists to develop new tools that use spatial information to measure gene-expression at close to single-cell levels. Yet, while Fluorescence In-situ Hybridization (FISH) can quantify transcript numbers at single-cell resolution, it is limited to a small number of genes. Similarly, slide-seq was designed to measure spatial-expression profiles at the single-cell level but has a relatively low gene-capture rate. And although single-cell RNA-seq enables deep cellular gene-expression profiling, it loses spatial information during sample-collection. These major limitations have stymied these methods’ broader application in the field. To overcome spatio-omics technology’s limitations and better understand spatial patterns at single-cell resolution, we designed a computation algorithm that uses glmSMA to predict cell locations by integrating scRNA-seq data with a spatial-omics reference atlas. We treated cell-mapping as a convex optimization problem by minimizing the differences between cellular-expression profiles and location-expression profiles with an L1 regularization and graph Laplacian based L2 regularization to ensure a sparse and smooth mapping. We validated the mapping results by reconstructing spatial- expression patterns of well-known marker genes in complex tissues, like the mouse cerebellum and hippocampus. We used the biological literature to verify that the reconstructed patterns can recapitulate cell-type and anatomy structures. Our work thus far shows that, together, we can use glmSMA to accurately assign single cells to their original reference-atlas locations.


2021 ◽  
Author(s):  
Fang Ye ◽  
Guodong Zhang ◽  
Weigao E ◽  
Haide Chen ◽  
Chengxuan Yu ◽  
...  

Abstract The Mexican axolotl (Ambystoma mexicanum) is a promising tetrapod model for regeneration and developmental studies. Remarkably, neotenic axolotls may undergo metamorphosis, during which their regeneration capacity and lifespan gradually decline. However, a system-level single-cell analysis of molecular characteristics in neotenic and metamorphosed axolotls is still lacking. Here, we developed a single-cell RNA-seq method based on combinatorial hybridization to generate a tissue-based transcriptomic atlas of the adult axolotl. We performed gene expression profiling of over 1 million single cells across 19 tissues to construct the first adult axolotl cell atlas. Comparison of single-cell transcriptomes between the tissues of neotenic and metamorphosed axolotls revealed the heterogeneity of structural cells in different tissues and established their regulatory network. Furthermore, we described dynamic gene expression patterns during limb development in neotenic axolotls. These data serve as a resource to explore the molecular identity of the axolotl as well as its metamorphosis.


2019 ◽  
Author(s):  
Yiliang Zhang ◽  
Kexuan Liang ◽  
Molei Liu ◽  
Yue Li ◽  
Hao Ge ◽  
...  

AbstractSingle-cell RNA sequencing technologies are widely used in recent years as a powerful tool allowing the observation of gene expression at the resolution of single cells. Two of the major challenges in scRNA-seq data analysis are dropout events and batch effects. The inflation of zero(dropout rate) varies substantially across single cells. Evidence has shown that technical noise, including batch effects, explains a notable proportion of this cell-to-cell variation. To capture biological variation, it is necessary to quantify and remove technical variation. Here, we introduce SCRIBE (Single-Cell Recovery Imputation with Batch Effects), a principled framework that imputes dropout events and corrects batch effects simultaneously. We demonstrate, through real examples, that SCRIBE outperforms existing scRNA-seq data analysis tools in recovering cell-specific gene expression patterns, removing batch effects and retaining biological variation across cells. Our software is freely available online at https://github.com/YiliangTracyZhang/SCRIBE.


2018 ◽  
Author(s):  
George C. Linderman ◽  
Jun Zhao ◽  
Yuval Kluger

ABSTRACTSingle cell RNA-sequencing (scRNA-seq) methods have revolutionized the study of gene expression but are plagued by dropout events, a phenomenon where genes actually expressed in a given cell are incorrectly measured as unexpressed. We present a method based on low-rank approximation which successfully replaces these dropouts (zero expression levels of unobserved expressed genes) by nonzero values, while preserving biologically non-expressed genes (true biological zeros) at zero expression levels. We validate our approach and compare it to two state-of-the-art methods. We show that it recovers true expression of marker genes while preserving biological zeros, increases separation of known cell types and improves correlation of simulated cells to their true profiles. Furthermore, our method is dramatically more scalable, allowing practitioners to quickly and easily recover expression of even the largest scRNA-seq datasets.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Kwangbom Choi ◽  
Narayanan Raghupathy ◽  
Gary A. Churchill

AbstractAllele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course.


2018 ◽  
Author(s):  
Kent A. Riemondy ◽  
Monica Ransom ◽  
Christopher Alderman ◽  
Austin E. Gillen ◽  
Rui Fu ◽  
...  

ABSTRACTSingle-cell RNA sequencing (scRNA-seq) methods generate sparse gene expression profiles for thousands of single cells in a single experiment. The information in these profiles is sufficient to classify cell types by distinct expression patterns but the high complexity of scRNA-seq libraries often prevents full characterization of transcriptomes from individual cells. To extract more focused gene expression information from scRNA-seq libraries, we developed a strategy to physically recover the DNA molecules comprising transcriptome subsets, enabling deeper interrogation of the isolated molecules by another round of DNA sequencing. We applied the method in cell-centric and gene-centric modes to isolate cDNA fragments from scRNA-seq libraries. First, we resampled the transcriptomes of rare, single megakaryocytes from a complex mixture of lymphocytes and analyzed them in a second round of DNA sequencing, yielding up to 20-fold greater sequencing depth per cell and increasing the number of genes detected per cell from a median of 1,313 to 2,002. We similarly isolated mRNAs from targeted T cells to improve the reconstruction of their VDJ-rearranged immune receptor mRNAs. Second, we isolatedCD3DmRNA fragments expressed across cells in a scRNA-seq library prepared from a clonal T cell line, increasing the number of cells with detectedCD3Dexpression from 59.7% to 100%. Transcriptome resampling is a general approach to recover targeted gene expression information from single-cell RNA sequencing libraries that enhances the utility of these costly experiments, and may be applicable to the targeted recovery of molecules from other single-cell assays.


2020 ◽  
Vol 3 (4) ◽  
pp. 72
Author(s):  
Anupama Prakash ◽  
Antónia Monteiro

Butterflies are well known for their beautiful wings and have been great systems to understand the ecology, evolution, genetics, and development of patterning and coloration. These color patterns are mosaics on the wing created by the tiling of individual units called scales, which develop from single cells. Traditionally, bulk RNA sequencing (RNA-seq) has been used extensively to identify the loci involved in wing color development and pattern formation. RNA-seq provides an averaged gene expression landscape of the entire wing tissue or of small dissected wing regions under consideration. However, to understand the gene expression patterns of the units of color, which are the scales, and to identify different scale cell types within a wing that produce different colors and scale structures, it is necessary to study single cells. This has recently been facilitated by the advent of single-cell sequencing. Here, we provide a detailed protocol for the dissociation of cells from Bicyclus anynana pupal wings to obtain a viable single-cell suspension for downstream single-cell sequencing. We outline our experimental design and the use of fluorescence-activated cell sorting (FACS) to obtain putative scale-building and socket cells based on size. Finally, we discuss some of the current challenges of this technique in studying single-cell scale development and suggest future avenues to address these challenges.


Sign in / Sign up

Export Citation Format

Share Document