Versatile workflow for cell type resolved transcriptional and epigenetic profiles from cryopreserved human lung

AbstractThe complexity of the lung microenvironment together with changes in cellular composition during disease progression make it exceptionally hard to understand the molecular mechanisms leading to the development of chronic lung diseases. Although recent advances in cell type resolved and single-cell sequencing approaches hold great promise for studying complex diseases, their implementation greatly relies on local access to fresh tissue, as traditional methods to process and store tissue do not allow viable cell isolation. To overcome these hurdles, we developed a novel, versatile workflow that allows long-term storage of human lung tissue with high cell viability, permits thorough sample quality check before cell isolation, and is compatible with next generation sequencing-based profiling, including single-cell approaches. We demonstrate that cryopreservation is suitable for isolation of multiple cell types from different lung locations and is applicable to both healthy and diseased tissue, including COPD and tumor samples. Basal cells isolated from cryopreserved airways retain the ability to differentiate, indicating that cellular identity is not altered by cryopreservation. Importantly, using RNA sequencing (RNA-seq) and Illumina EPIC Array, we show that genome-wide gene expression and DNA methylation signatures are preserved upon cryopreservation, emphasizing the suitability of our workflow for -omics profiling of human lung cells. In addition, we obtained high-quality single-cell RNA sequencing data of cells isolated from cryopreserved human lung, demonstrating that cryopreservation empowers single-cell approaches. Overall, thanks to its simplicity, our cryopreservation workflow is well-suited for prospective tissue collection by academic collaborators and biobanks, opening worldwide access to human tissue.

Download Full-text

Critical downstream analysis steps for single-cell RNA sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbab105 ◽

2021 ◽

Author(s):

Zilong Zhang ◽

Feifei Cui ◽

Chen Lin ◽

Lingling Zhao ◽

Chunyu Wang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Noisy Data ◽

Single Cell Level ◽

Cell Type ◽

Sequencing Data ◽

Cell Level ◽

Bioinformatics Tool ◽

Single Cell Rna Sequencing ◽

Downstream Analysis

Abstract Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

FR-Match: Robust matching of cell type clusters from single cell RNA sequencing data using the Friedman-Rafsky non-parametric test

10.1101/2020.05.01.073445 ◽

2020 ◽

Author(s):

Yun Zhang ◽

Brian D. Aevermann ◽

Trygve E. Bakken ◽

Jeremy A. Miller ◽

Rebecca D. Hodge ◽

...

Keyword(s):

Human Brain ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

R Package ◽

Brain Regions ◽

Cortical Layer ◽

Middle Temporal Gyrus ◽

Cell Type ◽

Sequencing Data

AbstractSingle cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method – FR-Match – that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.

Download Full-text

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

F1000Research ◽

10.12688/f1000research.18490.2 ◽

2019 ◽

Vol 8 ◽

pp. 296 ◽

Cited By ~ 1

Author(s):

J. Javier Diaz-Mejia ◽

Elaine C. Meng ◽

Alexander R. Pico ◽

Sonya A. MacParland ◽

Troy Ketela ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Human Peripheral Blood ◽

Characteristic Curve ◽

Cell Type ◽

Sequencing Data ◽

Cell Clusters ◽

Reference Cell ◽

Mouse Tissues ◽

Single Cell Rna Sequencing

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Download Full-text

SciBet: a portable and fast single cell type identifier

10.1101/645358 ◽

2019 ◽

Cited By ~ 2

Author(s):

Chenwei Li ◽

Baolin Liu ◽

Boxi Kang ◽

Zedao Liu ◽

Yedan Liu ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Sequencing Data ◽

Local Computation ◽

Local Data ◽

Single Cell Rna Sequencing ◽

Cross Platform ◽

Single Cell Type ◽

User Friendly

ABSTRACTFast, robust and technology-independent computational methods are needed for supervised cell type annotation of single-cell RNA sequencing data. We present SciBet, a Bayesian classifier that accurately predicts cell identity for newly sequenced cells or cell clusters. We enable web client deployment of SciBet for rapid local computation without uploading local data to the server. This user-friendly and cross-platform tool can be widely useful for single cell type identification.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

10.1101/2020.04.22.056473 ◽

2020 ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Data Clusters

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx1113 ◽

2017 ◽

Vol 46 (3) ◽

pp. e14-e14 ◽

Cited By ~ 7

Author(s):

Martin Barron ◽

Siyuan Zhang ◽

Jun Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Clustering Algorithm ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Download Full-text

Comprehensive Integration of Single-Cell Transcriptional Profiling Reveals the Heterogeneities of Non-cardiomyocytes in Healthy and Ischemic Hearts

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2020.615161 ◽

2020 ◽

Vol 7 ◽

Author(s):

Lingfang Zhuang ◽

Lin Lu ◽

Ruiyan Zhang ◽

Kang Chen ◽

Xiaoxiang Yan

Keyword(s):

Myocardial Infarction ◽

Single Cell ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Developmental Trajectories ◽

Transcriptional Profiling ◽

Sequencing Data ◽

Signature Genes

Advances in single-cell RNA sequencing (scRNA-seq) technology have recently shed light on the molecular mechanisms of the spatial and temporal changes of thousands of cells simultaneously under homeostatic and ischemic conditions. The aim of this study is to investigate whether it is possible to integrate multiple similar scRNA-seq datasets for a more comprehensive understanding of diseases. In this study, we integrated three representative scRNA-seq datasets of 27,349 non-cardiomyocytes isolated at 3 and 7 days after myocardial infarction or sham surgery. In total, seven lineages, including macrophages, fibroblasts, endothelia, and lymphocytes, were identified in this analysis with distinct dynamic and functional properties in healthy and nonhealthy hearts. Myofibroblasts and endothelia were recognized as the central hubs of cellular communication via ligand-receptor interactions. Additionally, we showed that macrophages from different origins exhibited divergent transcriptional signatures, pathways, developmental trajectories, and transcriptional regulons. It was found that myofibroblasts predominantly expand at 7 days after myocardial infarction with pro-reparative characteristics. We identified signature genes of myofibroblasts, such as Postn, Cthrc1, and Ddah1, among which Ddah1 was exclusively expressed on activated fibroblasts and exhibited concordant upregulation in bulk RNA sequencing data and in vivo and in vitro experiments. Collectively, this compendium of scRNA-seq data provides a valuable entry point for understanding the transcriptional and dynamic changes of non-cardiomyocytes in healthy and nonhealthy hearts by integrating multiple datasets.

Download Full-text

An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data

Nature Machine Intelligence ◽

10.1038/s42256-020-00244-4 ◽

2020 ◽

Vol 2 (11) ◽

pp. 693-703 ◽

Cited By ~ 1

Author(s):

Lifei Wang ◽

Rui Nie ◽

Zeyang Yu ◽

Ruyue Xin ◽

Caihong Zheng ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Type Gene

Download Full-text

deconvSeq: deconvolution of cell mixture distribution in sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btz444 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5095-5102 ◽

Cited By ~ 10

Author(s):

Rose Du ◽

Vince Carey ◽

Scott T Weiss

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Whole Blood ◽

Bisulfite Sequencing ◽

Supplementary Information ◽

Tissue Type ◽

Cell Type ◽

Sequencing Data ◽

Blood Samples ◽

Whole Blood Samples

Abstract Motivation Although single-cell sequencing is becoming more widely available, many tissue samples such as intracranial aneurysms are both fibrous and minute, and therefore not easily dissociated into single cells. To account for the cell type heterogeneity in such tissues therefore requires a computational method. We present a computational deconvolution method, deconvSeq, for sequencing data (RNA and bisulfite) obtained from bulk tissue. This method can also be applied to single-cell RNA sequencing data. Results DeconvSeq utilizes a generalized linear model to model effects of tissue type on feature quantification, which is specific to the data structure of the sequencing type used. Estimated model coefficients can then be used to predict the cell type mixture within a tissue. Predicted cell type mixtures were validated against actual cell counts in whole blood samples. Using this method, we obtained a mean correlation of 0.998 (95% CI 0.995–0.999) from the RNA sequencing data of 35 whole blood samples and 0.95 (95% CI 0.91–0.98) from the reduced representation bisulfite sequencing data from 35 whole blood samples. Using symmetric balances to obtain the correlation between compositional parts, we found that the lowest correlation occurred for monocytes for both RNA and bisulfite sequencing. Comparison with other methods of decomposition such as deconRNAseq, CIBERSORT, MuSiC and EpiDISH showed that deconvSeq is able to achieve good prediction using mean correlation with far fewer genes or CpG sites in the signature set. Availability and implementation Software implementing deconvSeq is available at https://github.com/rosedu1/deconvSeq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text