scholarly journals A Compositional Model to Assess Expression Changes from Single-Cell Rna-Seq Data

2019 ◽  
Author(s):  
By Xiuyu Ma ◽  
Keegan Korthauer ◽  
Christina Kendziorski ◽  
Michael A. Newton

AbstractOn the problem of scoring genes for evidence of changes in the distribution of single-cell expression, we introduce an empirical Bayesian mixture approach and evaluate its operating characteristics in a range of numerical experiments. The proposed approach leverages cell-subtype structure revealed in cluster analysis in order to boost gene-level information on expression changes. Cell clustering informs gene-level analysis through a specially-constructed prior distribution over pairs of multinomial probability vectors; this prior meshes with available model-based tools that score patterns of differential expression over multiple subtypes. We derive an explicit formula for the posterior probability that a gene has the same distribution in two cellular conditions, allowing for a gene-specific mixture over subtypes in each condition. Advantage is gained by the compositional structure of the model, in which a host of gene-specific mixture components are allowed, but also in which the mixing proportions are constrained at the whole cell level. This structure leads to a novel form of information sharing through which the cell-clustering results support gene-level scoring of differential distribution. The result, according to our numerical experiments, is improved sensitivity compared to several standard approaches for detecting distributional expression changes.




2021 ◽  
Vol 17 (6) ◽  
pp. e1009118
Author(s):  
Jing Qi ◽  
Yang Zhou ◽  
Zicen Zhao ◽  
Shuilin Jin

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.



2018 ◽  
Author(s):  
Nan Papili Gao ◽  
Thomas Hartmann ◽  
Tao Fang ◽  
Rudiyanto Gunawan

SummaryWe present CALISTA (Clustering and Lineage Inference in Single-Cell Transcriptional Analysis), a numerically efficient and highly scalable toolbox for an end-to-end analysis of single-cell transcriptomic profiles. CALISTA includes four essential single-cell analyses for cell differentiation studies, including single-cell clustering, reconstruction of cell lineage specification, transition gene identification, and pseudotemporal cell ordering. In these analyses, we employ a likelihood-based approach where single-cell mRNA counts are described by a probabilistic distribution function associated with stochastic gene transcriptional bursts and random technical dropout events. We evaluated the performance of CALISTA by analyzing single-cell gene expression datasets from in silico simulations and various single-cell transcriptional profiling technologies, comprising a few hundreds to tens of thousands of cells. A comparison with existing single-cell expression analyses, including MONOCLE 2 and SCANPY, demonstrated the superiority of CALISTA in reconstructing cell lineage progression and ordering cells along cell differentiation paths. CALISTA is freely available on https://www.cabselab.com/calista.





2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael F. Z. Wang ◽  
Madhav Mantri ◽  
Shao-Pei Chou ◽  
Gaetano J. Scuderi ◽  
David W. McKellar ◽  
...  

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.



2021 ◽  
Vol 1738 ◽  
pp. 012078
Author(s):  
Yaxuan Cui ◽  
Kunjie Luo ◽  
Zheyu Zhang ◽  
Saijia Liu


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ryan B. Patterson-Cross ◽  
Ariel J. Levine ◽  
Vilas Menon

Abstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.



Cancers ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1250
Author(s):  
Guangchun Han ◽  
Ansam Sinjab ◽  
Kieko Hara ◽  
Warapen Treekitkarnmongkol ◽  
Patrick Brennan ◽  
...  

The novel coronavirus SARS-CoV-2 is the causative agent of the COVID-19 pandemic. Severely symptomatic COVID-19 is associated with lung inflammation, pneumonia, and respiratory failure, thereby raising concerns of elevated risk of COVID-19-associated mortality among lung cancer patients. Angiotensin-converting enzyme 2 (ACE2) is the major receptor for SARS-CoV-2 entry into lung cells. The single-cell expression landscape of ACE2 and other SARS-CoV-2-related genes in pulmonary tissues of lung cancer patients remains unknown. We sought to delineate single-cell expression profiles of ACE2 and other SARS-CoV-2-related genes in pulmonary tissues of lung adenocarcinoma (LUAD) patients. We examined the expression levels and cellular distribution of ACE2 and SARS-CoV-2-priming proteases TMPRSS2 and TMPRSS4 in 5 LUADs and 14 matched normal tissues by single-cell RNA-sequencing (scRNA-seq) analysis. scRNA-seq of 186,916 cells revealed epithelial-specific expression of ACE2, TMPRSS2, and TMPRSS4. Analysis of 70,030 LUAD- and normal-derived epithelial cells showed that ACE2 levels were highest in normal alveolar type 2 (AT2) cells and that TMPRSS2 was expressed in 65% of normal AT2 cells. Conversely, the expression of TMPRSS4 was highest and most frequently detected (75%) in lung cells with malignant features. ACE2-positive cells co-expressed genes implicated in lung pathobiology, including COPD-associated HHIP, and the scavengers CD36 and DMBT1. Notably, the viral scavenger DMBT1 was significantly positively correlated with ACE2 expression in AT2 cells. We describe normal and tumor lung epithelial populations that express SARS-CoV-2 receptor and proteases, as well as major host defense genes, thus comprising potential treatment targets for COVID-19 particularly among lung cancer patients.



Author(s):  
T. Ichiki ◽  
T. Ujiie ◽  
T. Hara ◽  
Y. Horiike ◽  
K. Yasuda


Sign in / Sign up

Export Citation Format

Share Document