scholarly journals Statistical significance of cluster membership for determination of cell identities in single cell genomics

2018 ◽  
Author(s):  
Neo Christopher Chung

AbstractSingle cell RNA sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts, and environmental stimuli. Cell identities of samples derived from heterogeneous subpopulations are routinely determined by clustering of scRNA-seq data. Computational cell identities are then used in downstream analysis, feature selection, and visualization. However, how can we examine if cell identities are accurately inferred? To this end, we introduce non-parametric methods to evaluate cell identities by testing cluster memberships of single cell samples in an unsupervised manner. We propose posterior inclusion probabilities for cluster memberships to select and visualize samples relevant to subpopulations. Beyond simulation studies, we examined two scRNA-seq data - a mixture of Jurkat and 293T cells and a large family of peripheral blood mononuclear cells. We demonstrated probabilistic feature selection and improved t-SNE visualization. By learning uncertainty in clustering, the proposed methods enable rigorous testing of cell identities in scRNA-seq.

2020 ◽  
Vol 36 (10) ◽  
pp. 3107-3114 ◽  
Author(s):  
Neo Christopher Chung

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. Results We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. Availability and implementation https://cran.r-project.org/package=jackstraw. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Wenkai Han ◽  
Yuqi Cheng ◽  
Jiayang Chen ◽  
Huawen Zhong ◽  
Zhihang Hu ◽  
...  

Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool to reveal the complex biological diversity and heterogeneity among cell populations. However, the technical noise and bias of the technology still have negative impacts on the downstream analysis. Here, we present a self-supervised Contrastive LEArning framework for scRNA-seq (CLEAR) profile representation and the downstream analysis. CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events. In the task, the deep learning model learns to pull together the representations of similar cells while pushing apart distinct cells, without manual labeling. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43,695 single cells from peripheral blood mononuclear cells. Further experiments to process a million-scale single-cell dataset demonstrate the scalability of CLEAR. This scalable method generates effective scRNA-seq data representation while eliminating technical noise, and it will serve as a general computational framework for single-cell data analysis.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009548
Author(s):  
Qunlun Shen ◽  
Shihua Zhang

With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.


2021 ◽  
Author(s):  
Emily Stephenson ◽  
◽  
Gary Reynolds ◽  
Rachel A. Botting ◽  
Fernando J. Calero-Nieto ◽  
...  

AbstractAnalysis of human blood immune cells provides insights into the coordinated response to viral infections such as severe acute respiratory syndrome coronavirus 2, which causes coronavirus disease 2019 (COVID-19). We performed single-cell transcriptome, surface proteome and T and B lymphocyte antigen receptor analyses of over 780,000 peripheral blood mononuclear cells from a cross-sectional cohort of 130 patients with varying severities of COVID-19. We identified expansion of nonclassical monocytes expressing complement transcripts (CD16+C1QA/B/C+) that sequester platelets and were predicted to replenish the alveolar macrophage pool in COVID-19. Early, uncommitted CD34+ hematopoietic stem/progenitor cells were primed toward megakaryopoiesis, accompanied by expanded megakaryocyte-committed progenitors and increased platelet activation. Clonally expanded CD8+ T cells and an increased ratio of CD8+ effector T cells to effector memory T cells characterized severe disease, while circulating follicular helper T cells accompanied mild disease. We observed a relative loss of IgA2 in symptomatic disease despite an overall expansion of plasmablasts and plasma cells. Our study highlights the coordinated immune response that contributes to COVID-19 pathogenesis and reveals discrete cellular components that can be targeted for therapy.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ailu Chen ◽  
Maria P. Diaz-Soto ◽  
Miguel F. Sanmamed ◽  
Taylor Adams ◽  
Jonas C. Schupp ◽  
...  

Abstract Background Asthma has been associated with impaired interferon response. Multiple cell types have been implicated in such response impairment and may be responsible for asthma immunopathology. However, existing models to study the immune response in asthma are limited by bulk profiling of cells. Our objective was to Characterize a model of peripheral blood mononuclear cells (PBMCs) of patients with severe asthma (SA) and its response to the TLR3 agonist Poly I:C using two single-cell methods. Methods Two complementary single-cell methods, DropSeq for single-cell RNA sequencing (scRNA-Seq) and mass cytometry (CyTOF), were used to profile PBMCs of SA patients and healthy controls (HC). Poly I:C-stimulated and unstimulated cells were analyzed in this study. Results PBMCs (n = 9414) from five SA (n = 6099) and three HC (n = 3315) were profiled using scRNA-Seq. Six main cell subsets, namely CD4 + T cells, CD8 + T cells, natural killer (NK) cells, B cells, dendritic cells (DCs), and monocytes, were identified. CD4 + T cells were the main cell type in SA and demonstrated a pro-inflammatory profile characterized by increased JAK1 expression. Following Poly I:C stimulation, PBMCs from SA had a robust induction of interferon pathways compared with HC. CyTOF profiling of Poly I:C stimulated and unstimulated PBMCs (n = 160,000) from the same individuals (SA = 5; HC = 3) demonstrated higher CD8 + and CD8 + effector T cells in SA at baseline, followed by a decrease of CD8 + effector T cells after poly I:C stimulation. Conclusions Single-cell profiling of an in vitro model using PBMCs in patients with SA identified activation of pro-inflammatory pathways at baseline and strong response to Poly I:C, as well as quantitative changes in CD8 + effector cells. Thus, transcriptomic and cell quantitative changes are associated with immune cell heterogeneity in this model to evaluate interferon responses in severe asthma.


2021 ◽  
Author(s):  
Cantong Zhang ◽  
Xiaoping Hong ◽  
Haiyan Yu ◽  
Hongwei Wu ◽  
Huixuan Xu ◽  
...  

Abstract Rheumatoid arthritis is a chronic autoinflammatory disease with an elusive etiology. Assays for transposase-accessible chromatin with single-cell sequencing (scATAC-seq) contribute to the progress in epigenetic studies. However, the impact of epigenetic technology on autoimmune diseases has not been objectively analyzed. Therefore, scATAC-seq was performed to generate a high-resolution map of accessible loci in peripheral blood mononuclear cells (PBMCs) of RA patients at the single-cell level. The purpose of our project was to discover the transcription factors (TFs) that were involved in the pathogenesis of RA at single-cell resolution. In our research, we obtained 22 accessible chromatin patterns. Then, 10 key TFs were involved in the RA pathogenesis by regulating the activity of MAP kinase. Consequently, two genes (PTPRC, SPAG9) regulated by 10 key TFs were found that may be associated with RA disease pathogenesis and these TFs were obviously enriched in RA patients (p<0.05, FC>1.2). With further qPCR validation on PTPRC and SPAG9 in monocytes, we found differential expression of these two genes, which were regulated by eight TFs (ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), MEF2B). What is more, the eight TFs showed highly accessible binding sites in RA patients. These findings demonstrate the value of using scATAC-seq to reveal transcriptional regulatory variation in RA-derived PBMCs, providing insights on therapy from an epigenetic perspective.


2021 ◽  
Author(s):  
Zhibin Li ◽  
chengcheng Sun ◽  
Fei Wang ◽  
Xiran Wang ◽  
Jiacheng Zhu ◽  
...  

Background: Immune cells play important roles in mediating immune response and host defense against invading pathogens. However, insights into the molecular mechanisms governing circulating immune cell diversity among multiple species are limited. Methods: In this study, we compared the single-cell transcriptomes of 77 957 immune cells from 12 species using single-cell RNA-sequencing (scRNA-seq). Distinct molecular profiles were characterized for different immune cell types, including T cells, B cells, natural killer cells, monocytes, and dendritic cells. Results: The results revealed the heterogeneity and compositions of circulating immune cells among 12 different species. Additionally, we explored the conserved and divergent cellular cross-talks and genetic regulatory networks among vertebrate immune cells. Notably, the ligand and receptor pair VIM-CD44 was highly conserved among the immune cells. Conclusions: This study is the first to provide a comprehensive analysis of the cross-species single-cell atlas for peripheral blood mononuclear cells (PBMCs). This research should advance our understanding of the cellular taxonomy and fundamental functions of PBMCs, with important implications in evolutionary biology, developmental biology, and immune system disorders


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Alex R Schuurman ◽  
Tom DY Reijnders ◽  
Anno Saris ◽  
Ivan Ramirez Moral ◽  
Michiel Schinkel ◽  
...  

The exact immunopathophysiology of community-acquired pneumonia (CAP) caused by SARS-CoV-2 (COVID-19) remains clouded by a general lack of relevant disease controls. The scarcity of single-cell investigations in the broader population of patients with CAP renders it difficult to distinguish immune features unique to COVID-19 from the common characteristics of a dysregulated host response to pneumonia. We performed integrated single-cell transcriptomic and proteomic analyses in peripheral blood mononuclear cells from a matched cohort of eight patients with COVID-19, eight patients with CAP caused by Influenza A or other pathogens, and four non-infectious control subjects. Using this balanced, multi-omics approach, we describe shared and diverging transcriptional and phenotypic patterns—including increased levels of type I interferon-stimulated natural killer cells in COVID-19, cytotoxic CD8 T EMRA cells in both COVID-19 and influenza, and distinctive monocyte compositions between all groups—and thereby expand our understanding of the peripheral immune response in different etiologies of pneumonia.


2018 ◽  
Vol 20 (suppl_6) ◽  
pp. vi125-vi125
Author(s):  
Sophie Dusoswa ◽  
Jan Verhoeff ◽  
Matheus Crommentuijn ◽  
Tom Würdinger ◽  
David Noske ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document