Effects of sample size on plant single-cell RNA profiling

Abstract Background: Single-cell RNA (scRNA) profiling or scRNA-sequencing (scRNA-seq) is a rapidly developing technology and an important frontier of molecular biology science. scRNA profiling makes it possible to parallelly investigate diverse molecular features of multiple types of cells in a given plant tissue, and promotes elucidation of cellular heterogeneity and discovery of developmental processes underpinning cell differentiation. While it is assumed that the power of scRNA profiling in uncovering cellular heterogeneity largely depends on the depth of scRNA-seq, no study about the effect of the sequenced cell numbers on the power of plant scRNA-seq has ever been reported. Results: In this study, on the basis of analyzing the sample coverage of 1,244 available scRNA-seq studies (including 30 in plants) and the effect of sample coverage on cell clustering and identification of cell types, we evaluated the effects of sample size (i.e., cell number) on the outcome of single cell transcriptome analysis by sampling different number of cells from a pool of ~57,000 Arabidopsis thaliana root cells integrated from five published studies. Our results indicated that the most significant principle components could be achieved when 20,000-30,000 cells were sampled, a relatively high reliability of cell clustering could be achieved by using ~20,000 cells with little further improvement by using more cells, 96% of the differentially expressed genes could be successfully identified with no more than 20,000 cells, and a relatively stable pseudotime could be estimated in the sub-sample with 5,000 cells. Conclusions: Our results imply that ~20,000 (or 10,000 - 30,000[1] ) cells are enough for profiling Arabidopsis root cells using scRNA-seq, although the applicability of this number to other Arabidopsis tissues and other plants is yet to be further determined by analyzing scRNA-seq data generated from diverse tissues of different plant species. Nevertheless, our results provide a general guide for optimizing sample size to be used in plant scRNA-seq studies. Change to “or up to 300000”?

Download Full-text

Effects of Sample Size on Plant Single-Cell RNA Profiling

Current Issues in Molecular Biology ◽

10.3390/cimb43030119 ◽

2021 ◽

Vol 43 (3) ◽

pp. 1685-1697

Author(s):

Hongyu Chen ◽

Yang Lv ◽

Xinxin Yin ◽

Xi Chen ◽

Qinjie Chu ◽

...

Keyword(s):

Sample Size ◽

Single Cell ◽

High Reliability ◽

Cell Number ◽

Root Cells ◽

Rna Profiling ◽

Molecular Features ◽

Cell Clustering ◽

Cell Transcriptome ◽

Single Cell Transcriptome

Single-cell RNA (scRNA) profiling or scRNA-sequencing (scRNA-seq) makes it possible to parallelly investigate diverse molecular features of multiple types of cells in a given plant tissue and discover cell developmental processes. In this study, we evaluated the effects of sample size (i.e., cell number) on the outcome of single-cell transcriptome analysis by sampling different numbers of cells from a pool of ~57,000 Arabidopsis thaliana root cells integrated from five published studies. Our results indicated that the most significant principal components could be achieved when 20,000–30,000 cells were sampled, a relatively high reliability of cell clustering could be achieved by using ~20,000 cells with little further improvement by using more cells, 96% of the differentially expressed genes could be successfully identified with no more than 20,000 cells, and a relatively stable pseudotime could be estimated in the subsample with 5000 cells. Finally, our results provide a general guide for optimizing sample size to be used in plant scRNA-seq studies.

Download Full-text

A single-cell atlas of the human healthy airways

10.1101/2019.12.21.884759 ◽

2019 ◽

Cited By ~ 13

Author(s):

Marie Deprez ◽

Laure-Emmanuelle Zaragosi ◽

Marin Truchi ◽

Sandra Ruiz Garcia ◽

Marie-Jeanne Arguel ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Stable Gene ◽

Brush Cells ◽

Rna Profiling ◽

Human Airway Epithelium ◽

Population Distributions

AbstractRationaleThe respiratory tract constitutes an elaborated line of defense based on a unique cellular ecosystem. Single-cell profiling methods enable the investigation of cell population distributions and transcriptional changes along the airways.MethodsWe have explored cellular heterogeneity of the human airway epithelium in 10 healthy living volunteers by single-cell RNA profiling. 77,969 cells were collected by bronchoscopy at 35 distinct locations, from the nose to the 12th division of the airway tree.ResultsThe resulting atlas is composed of a high percentage of epithelial cells (89.1%), but also immune (6.2%) and stromal (4.7%) cells with peculiar cellular proportions in different sites of the airways. It reveals differential gene expression between identical cell types (suprabasal, secretory, and multiciliated cells) from the nose (MUC4, PI3, SIX3) and tracheobronchial (SCGB1A1, TFF3) airways. By contrast, cell-type specific gene expression was stable across all tracheobronchial samples. Our atlas improves the description of ionocytes, pulmonary neuro-endocrine (PNEC) and brush cells, which are likely derived from a common population of precursor cells. We also report a population of KRT13 positive cells with a high percentage of dividing cells which are reminiscent of “hillock” cells previously described in mouse.ConclusionsRobust characterization of this unprecedented large single-cell cohort establishes an important resource for future investigations. The precise description of the continuum existing from nasal epithelium to successive divisions of lung airways and the stable gene expression profile of these regions better defines conditions under which relevant tracheobronchial proxies of human respiratory diseases can be developed.

Download Full-text

Molecular and Cellular Dynamics of Aortic Aneurysms Revealed by Single-Cell Transcriptomics

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvbaha.121.315852 ◽

2021 ◽

Author(s):

Yanming Li ◽

Scott A. LeMaire ◽

Ying H. Shen

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Cellular Level ◽

Aortic Aneurysms ◽

Cellular Heterogeneity ◽

Cell Populations ◽

Sequencing Analysis ◽

Molecular Features ◽

Single Cell Rna Sequencing

The aorta is highly heterogeneous, containing many different types of cells that perform sophisticated functions to maintain aortic homeostasis. Recently, single-cell RNA sequencing studies have provided substantial new insight into the heterogeneity of vascular cell types, the comprehensive molecular features of each cell type, and the phenotypic interrelationship between these cell populations. This new information has significantly improved our understanding of aortic biology and aneurysms at the molecular and cellular level. Here, we summarize these findings, with a focus on what single-cell RNA sequencing analysis has revealed about cellular heterogeneity, cellular transitions, communications among cell populations, and critical transcription factors in the vascular wall. We also review the information learned from single-cell RNA sequencing that has contributed to our understanding of the pathogenesis of vascular disease, such as the identification of cell types in which aneurysm-related genes and genetic variants function. Finally, we discuss the challenges and future directions of single-cell RNA sequencing applications in studies of aortic biology and diseases.

Download Full-text

FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants

Genome Biology ◽

10.1186/s13059-021-02288-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Cell Walls ◽

Large Scale ◽

Full Length ◽

Cell Level ◽

Root Cells ◽

Rna Profiling ◽

Different Types ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

Download Full-text

Single-cell transcriptomics following ischemic injury identifies a role for B2M in cardiac repair

Communications Biology ◽

10.1038/s42003-020-01636-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Bas Molenaar ◽

Louk T. Timmer ◽

Marjolein Droog ◽

Ilaria Perini ◽

Danielle Versteeg ◽

...

Keyword(s):

Single Cell ◽

Communication Networks ◽

Cardiac Remodeling ◽

Cardiac Injury ◽

Ischemic Injury ◽

Cell Types ◽

Repair Process ◽

Cardiac Repair ◽

Cellular Heterogeneity ◽

Intercellular Signaling

AbstractThe efficiency of the repair process following ischemic cardiac injury is a crucial determinant for the progression into heart failure and is controlled by both intra- and intercellular signaling within the heart. An enhanced understanding of this complex interplay will enable better exploitation of these mechanisms for therapeutic use. We used single-cell transcriptomics to collect gene expression data of all main cardiac cell types at different time-points after ischemic injury. These data unveiled cellular and transcriptional heterogeneity and changes in cellular function during cardiac remodeling. Furthermore, we established potential intercellular communication networks after ischemic injury. Follow up experiments confirmed that cardiomyocytes express and secrete elevated levels of beta-2 microglobulin in response to ischemic damage, which can activate fibroblasts in a paracrine manner. Collectively, our data indicate phase-specific changes in cellular heterogeneity during different stages of cardiac remodeling and allow for the identification of therapeutic targets relevant for cardiac repair.

Download Full-text

Selecting single cell clustering parameter values using subsampling-based robustness metrics

BMC Bioinformatics ◽

10.1186/s12859-021-03957-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ryan B. Patterson-Cross ◽

Ariel J. Levine ◽

Vilas Menon

Keyword(s):

Single Cell ◽

Optimal Parameter ◽

Clustering Algorithms ◽

Cell Types ◽

Parameter Selection ◽

Data Set ◽

Biologically Relevant ◽

Cell Clustering ◽

Parameter Values ◽

Robustness Metrics

Abstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.

Download Full-text

Dissecting Cellular Heterogeneity Based on Network Denoising of scRNA-seq Using Local Scaling Self-Diffusion

Frontiers in Genetics ◽

10.3389/fgene.2021.811043 ◽

2022 ◽

Vol 12 ◽

Author(s):

Xin Duan ◽

Wei Wang ◽

Minghui Tang ◽

Feng Gao ◽

Xudong Lin

Keyword(s):

Metric Learning ◽

Cell Types ◽

Primary Objective ◽

The Self ◽

Cellular Heterogeneity ◽

Clustering Methods ◽

Local Scaling ◽

Self Diffusion ◽

Cell Clustering ◽

High Level

Identifying the phenotypes and interactions of various cells is the primary objective in cellular heterogeneity dissection. A key step of this methodology is to perform unsupervised clustering, which, however, often suffers challenges of the high level of noise, as well as redundant information. To overcome the limitations, we proposed self-diffusion on local scaling affinity (LSSD) to enhance cell similarities’ metric learning for dissecting cellular heterogeneity. Local scaling infers the self-tuning of cell-to-cell distances that are used to construct cell affinity. Our approach implements the self-diffusion process by propagating the affinity matrices to further improve the cell similarities for the downstream clustering analysis. To demonstrate the effectiveness and usefulness, we applied LSSD on two simulated and four real scRNA-seq datasets. Comparing with other single-cell clustering methods, our approach demonstrates much better clustering performance, and cell types identified on colorectal tumors reveal strongly biological interpretability.

Download Full-text

Single-cell analysis reveals cellular heterogeneity and molecular determinants of hypothalamic leptin-receptor cells

10.1101/2020.07.23.217729 ◽

2020 ◽

Author(s):

N. Kakava-Georgiadou ◽

J.F. Severens ◽

A.M. Jørgensen ◽

K.M. Garner ◽

M.C.M Luijendijk ◽

...

Keyword(s):

Single Cell ◽

Leptin Receptor ◽

Single Cell Analysis ◽

Cell Types ◽

Cellular Heterogeneity ◽

Molecular Signature ◽

Neuronal Populations ◽

Hypothalamic Nuclei ◽

Satiety Hormone ◽

Multiple Cell

AbstractHypothalamic nuclei which regulate homeostatic functions express leptin receptor (LepR), the primary target of the satiety hormone leptin. Single-cell RNA sequencing (scRNA-seq) has facilitated the discovery of a variety of hypothalamic cell types. However, low abundance of LepR transcripts prevented further characterization of LepR cells. Therefore, we perform scRNA-seq on isolated LepR cells and identify eight neuronal clusters, including three uncharacterized Trh-expressing populations as well as 17 non-neuronal populations including tanycytes, oligodendrocytes and endothelial cells. Food restriction had a major impact on Agrp neurons and changed the expression of obesity-associated genes. Multiple cell clusters were enriched for GWAS signals of obesity. We further explored changes in the gene regulatory landscape of LepR cell types. We thus reveal the molecular signature of distinct populations with diverse neurochemical profiles, which will aid efforts to illuminate the multi-functional nature of leptin’s action in the hypothalamus.

Download Full-text

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Download Full-text

A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification

10.1101/247114 ◽

2018 ◽

Cited By ~ 1

Author(s):

Douglas Abrams ◽

Parveen Kumar ◽

R. Krishna Murthy Karuturi ◽

Joshy George

Keyword(s):

Experimental Design ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Number ◽

Fold Change ◽

Computational Method ◽

Marker Genes ◽

Cell Type ◽

Estimate Sample Size

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.

Download Full-text