scholarly journals Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

2019 ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

2021 ◽  
Author(s):  
Zhengyu Ouyang ◽  
Nathanael Bourgeois ◽  
Eugenia Lyashenko ◽  
Paige Cundiff ◽  
Patrick F Cullen ◽  
...  

Induced pluripotent stem cell (iPSC) derived cell types are increasingly employed as in vitro model systems for drug discovery. For these studies to be meaningful, it is important to understand the reproducibility of the iPSC-derived cultures and their similarity to equivalent endogenous cell types. Single-cell and single-nucleus RNA sequencing (RNA-seq) are useful to gain such understanding, but they are expensive and time consuming, while bulk RNA-seq data can be generated quicker and at lower cost. In silico cell type decomposition is an efficient, inexpensive, and convenient alternative that can leverage bulk RNA-seq to derive more fine-grained information about these cultures. We developed CellMap, a computational tool that derives cell type profiles from publicly available single-cell and single-nucleus datasets to infer cell types in bulk RNA-seq data from iPSC-derived cell lines.


2021 ◽  
Author(s):  
Hanbyeol Kim ◽  
Joongho Lee ◽  
Keunsoo Kang ◽  
Seokhyun Yoon

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2019 ◽  
Vol 47 (18) ◽  
pp. e111-e111 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

Abstract A key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009118
Author(s):  
Jing Qi ◽  
Yang Zhou ◽  
Zicen Zhao ◽  
Shuilin Jin

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.


2021 ◽  
Author(s):  
HanByeol Kim ◽  
Joongho Lee ◽  
Keunsoo Kang ◽  
Seokhyun Yoon

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.


2019 ◽  
Author(s):  
Ugur M. Ayturk ◽  
Joseph P. Scollan ◽  
Alexander Vesprey ◽  
Christina M. Jacobsen ◽  
Paola Divieti Pajevic ◽  
...  

ABSTRACTSingle cell RNA-seq (scRNA-seq) is emerging as a powerful technology to examine transcriptomes of individual cells. We determined whether scRNA-seq could be used to detect the effect of environmental and pharmacologic perturbations on osteoblasts. We began with a commonly used in vitro system in which freshly isolated neonatal mouse calvarial cells are expanded and induced to produce a mineralized matrix. We used scRNA-seq to compare the relative cell type abundances and the transcriptomes of freshly isolated cells to those that had been cultured for 12 days in vitro. We observed that the percentage of macrophage-like cells increased from 6% in freshly isolated calvarial cells to 34% in cultured cells. We also found that Bglap transcripts were abundant in freshly isolated osteoblasts but nearly undetectable in the cultured calvarial cells. Thus, scRNA-seq revealed significant differences between heterogeneity of cells in vivo and in vitro. We next performed scRNA-seq on freshly recovered long bone endocortical cells from mice that received either vehicle or Sclerostin-neutralizing antibody for 1 week. Bone anabolism-associated transcripts were also not significantly increased in immature and mature osteoblasts recovered from Sclerostin-neutralizing antibody treated mice; this is likely a consequence of being underpowered to detect modest changes in gene expression, since only 7% of the sequenced endocortical cells were osteoblasts, and a limited portion of their transcriptomes were sampled. We conclude that scRNA-seq can detect changes in cell abundance, identity, and gene expression in skeletally derived cells. In order to detect modest changes in osteoblast gene expression at the single cell level in the appendicular skeleton, larger numbers of osteoblasts from endocortical bone are required.


2021 ◽  
Author(s):  
Tallulah S Andrews ◽  
Jawairia Atif ◽  
Jeff C Liu ◽  
Catia T Perciani ◽  
Xue-Zhong Ma ◽  
...  

The critical functions of the human liver are coordinated through the interactions of hepatic parenchymal and non-parenchymal cells. Recent advances in single cell transcriptional approaches have enabled an examination of the human liver with unprecedented resolution. However, dissociation related cell perturbation can limit the ability to fully capture the human liver's parenchymal cell fraction, which limits the ability to comprehensively profile this organ. Here, we report the transcriptional landscape of 73,295 cells from the human liver using matched single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq). The addition of snRNA-seq enabled the characterization of interzonal hepatocytes at single-cell resolution, revealed the presence of rare subtypes of hepatic stellate cells previously only seen in disease, and detection of cholangiocyte progenitors that had only been observed during in vitro differentiation experiments. However, T and B lymphocytes and NK cells were only distinguishable using scRNA-seq, highlighting the importance of applying both technologies to obtain a complete map of tissue-resident cell-types. We validated the distinct spatial distribution of the hepatocyte, cholangiocyte and stellate cell populations by an independent spatial transcriptomics dataset and immunohistochemistry. Our study provides a systematic comparison of the transcriptomes captured by scRNA-seq and snRNA-seq and delivers a high-resolution map of the parenchymal cell populations in the healthy human liver.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Dylan Kotliar ◽  
Adrian Veres ◽  
M Aurel Nagy ◽  
Shervin Tabrizi ◽  
Eran Hodis ◽  
...  

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.


2020 ◽  
Author(s):  
Jixing Zhong ◽  
Gen Tang ◽  
Jiacheng Zhu ◽  
Xin Qiu ◽  
Weiying Wu ◽  
...  

AbstractParkinson’s disease (PD) is a neurodegenerative disease leading to the impairment of execution of movement. PD pathogenesis has been largely investigated, but either restricted in bulk level or at certain cell types, which failed to capture cellular heterogeneity and intrinsic interplays among distinct cell types. To overcome this, we applied single-nucleus RNA-seq and single cell ATAC-seq on cerebellum, midbrain and striatum of PD mouse and matched control. With 74,493 cells in total, we comprehensively depicted the dysfunctions under PD pathology covering proteostasis, neuroinflammation, calcium homeostasis and extracellular neurotransmitter homeostasis. Besides, by multi-omics approach, we identified putative biomarkers for early stage of PD, based on the relationships between transcriptomic and epigenetic profiles. We located certain cell types that primarily contribute to PD early pathology, narrowing the gap between genotypes and phenotypes. Taken together, our study provides a valuable resource to dissect the molecular mechanism of PD pathogenesis at single cell level, which could facilitate the development of novel methods regarding diagnosis, monitoring and practical therapies against PD at early stage.


Sign in / Sign up

Export Citation Format

Share Document