scholarly journals Decontamination of ambient RNA in single-cell RNA-seq with DecontX

2019 ◽  
Author(s):  
Shiyi Yang ◽  
Sean E. Corbett ◽  
Yusuke Koga ◽  
Zhe Wang ◽  
W. Evan Johnson ◽  
...  

ABSTRACTDroplet-based microfluidic devices have become widely used to perform single-cell RNA sequencing (scRNA-seq) and discover novel cellular heterogeneity in complex biological systems. However, ambient RNA present in the cell suspension can be incorporated into these droplets and aberrantly counted along with a cell’s native mRNA. This results in cross-contamination of transcripts between different cell populations and can potentially decrease the precision of downstream analyses. We developed a novel hierarchical Bayesian method called DecontX to estimate and remove contamination in individual cells from scRNA-seq data. DecontX accurately predicted the proportion of contaminated counts in a mixture of mouse and human cells. Decontamination of PBMC datasets removed aberrant expression of cell type specific marker genes from other cell types and improved overall separation of cell clusters. In general, DecontX can be incorporated into scRNA-seq workflows to assess quality of dissociation protocols and improve downstream analyses.

2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2021 ◽  
Author(s):  
Lorenzo Martini ◽  
Roberta Bardini ◽  
Stefano Di Carlo

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hongyu Guo ◽  
Jun Li

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254194
Author(s):  
Hong-Tae Park ◽  
Woo Bin Park ◽  
Suji Kim ◽  
Jong-Sung Lim ◽  
Gyoungju Nah ◽  
...  

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.


2018 ◽  
Author(s):  
Wennan Chang ◽  
Changlin Wan ◽  
Xiaoyu Lu ◽  
Szu-wei Tu ◽  
Yifan Sun ◽  
...  

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.


2018 ◽  
Vol 29 (8) ◽  
pp. 2060-2068 ◽  
Author(s):  
Nikos Karaiskos ◽  
Mahdieh Rahmatollahi ◽  
Anastasiya Boltengagen ◽  
Haiyue Liu ◽  
Martin Hoehne ◽  
...  

Background Three different cell types constitute the glomerular filter: mesangial cells, endothelial cells, and podocytes. However, to what extent cellular heterogeneity exists within healthy glomerular cell populations remains unknown.Methods We used nanodroplet-based highly parallel transcriptional profiling to characterize the cellular content of purified wild-type mouse glomeruli.Results Unsupervised clustering of nearly 13,000 single-cell transcriptomes identified the three known glomerular cell types. We provide a comprehensive online atlas of gene expression in glomerular cells that can be queried and visualized using an interactive and freely available database. Novel marker genes for all glomerular cell types were identified and supported by immunohistochemistry images obtained from the Human Protein Atlas. Subclustering of endothelial cells revealed a subset of endothelium that expressed marker genes related to endothelial proliferation. By comparison, the podocyte population appeared more homogeneous but contained three smaller, previously unknown subpopulations.Conclusions Our study comprehensively characterized gene expression in individual glomerular cells and sets the stage for the dissection of glomerular function at the single-cell level in health and disease.


2018 ◽  
Author(s):  
Xuran Wang ◽  
Jihwan Park ◽  
Katalin Susztak ◽  
Nancy R. Zhang ◽  
Mingyao Li

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.


2021 ◽  
Author(s):  
Risa Karakida Kawaguchi ◽  
Ziqi Tang ◽  
Stephan Fischer ◽  
Rohit Tripathy ◽  
Peter K. Koo ◽  
...  

Background: Single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) measures genome-wide chromatin accessibility for the discovery of cell-type specific regulatory networks. ScATAC-seq combined with single-cell RNA sequencing (scRNA-seq) offers important avenues for ongoing research, such as novel cell-type specific activation of enhancer and transcription factor binding sites as well as chromatin changes specific to cell states. On the other hand, scATAC-seq data is known to be challenging to interpret due to its high number of zeros as well as the heterogeneity derived from different protocols. Because of the stochastic lack of marker gene activities, cell type identification by scATAC-seq remains difficult even at a cluster level. Results: In this study, we exploit reference knowledge obtained from external scATAC-seq or scRNA-seq datasets to define existing cell types and uncover the genomic regions which drive cell-type specific gene regulation. To investigate the robustness of existing cell-typing methods, we collected 7 scATAC-seq datasets targeting mouse brain for a meta-analytic comparison of neuronal cell-type annotation, including a reference atlas generated by the BRAIN Initiative Cell Census Network (BICCN). By comparing the area under the receiver operating characteristics curves (AUROCs) for the three major cell types (inhibitory, excitatory, and non-neuronal cells), cell-typing performance by single markers is found to be highly variable even for known marker genes due to study-specific biases. However, the signal aggregation of a large and redundant marker gene set, optimized via multiple scRNA-seq data, achieves the highest cell-typing performances among 5 existing marker gene sets, from the individual cell to cluster level. That gene set also shows a high consistency with the cluster-specific genes from inhibitory subtypes in two well-annotated datasets, suggesting applicability to rare cell types. Next, we demonstrate a comprehensive assessment of scATAC-seq cell typing using exhaustive combinations of the marker gene sets with supervised learning methods including machine learning classifiers and joint clustering methods. Our results show that the combinations using robust marker gene sets systematically ranked at the top, not only with model based prediction using a large reference data but also with a simple summation of expression strengths across markers. To demonstrate the utility of this robust cell typing approach, we trained a deep neural network to predict chromatin accessibility in each subtype using only DNA sequence. Through model interpretation methods, we identify key motifs enriched about robust gene sets for each neuronal subtype. Conclusions: Through the meta-analytic evaluation of scATAC-seq cell-typing methods, we develop a novel method set to exploit the BICCN reference atlas. Our study strongly supports the value of robust marker gene selection as a feature selection tool and cross-dataset comparison between scATAC-seq datasets to improve alignment of scATAC-seq to known biology. With this novel, high quality epigenetic data, genomic analysis of regulatory regions can reveal sequence motifs that drive cell type-specific regulatory programs.


2020 ◽  
Author(s):  
Alexandre P. Marand ◽  
Zongliang Chen ◽  
Andrea Gallavotti ◽  
Robert J. Schmitz

ABSTRACTCis-regulatory elements (CREs) encode the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions. To identify CREs underlying cell-type specification and developmental transitions, we implemented single-cell sequencing of Assay for Transposase Accessible Chromatin in an atlas of Zea mays organs. We describe 92 distinct states of chromatin accessibility across more than 165,913 putative CREs, 56,575 cells, and 52 known cell-types in maize using a novel implementation of regularized quasibinomial logistic regression. Cell states were largely determined by combinatorial accessibility of transcription factors (TFs) and their binding sites. A neural network revealed that cell identity could be accurately predicted (>0.94) solely based on TF binding site accessibility. Co-accessible chromatin recapitulated higher-order chromatin interactions, with distinct sets of TFs coordinating cell type-specific regulatory dynamics. Pseudotime reconstruction and alignment with Arabidopsis thaliana trajectories identified conserved TFs, associated motifs, and cis-regulatory regions specifying sequential developmental progressions. Cell-type specific accessible chromatin regions were enriched with phenotype-associated genetic variants and signatures of selection, revealing the major cell-types and putative CREs targeted by modern maize breeding. Collectively, our analysis affords a comprehensive framework for understanding cellular heterogeneity, evolution, and cis-regulatory grammar of cell-type specification in a major crop species.


2016 ◽  
Author(s):  
Vincent Gardeux ◽  
Fabrice David ◽  
Adrian Shajkofci ◽  
Petra C Schwalie ◽  
Bart Deplancke

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet, these groups often lack the expertise to handle complex scRNA-seq data sets.ResultsWe developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types.AvailabilityThe tool is freely available at http://[email protected]


Sign in / Sign up

Export Citation Format

Share Document