scholarly journals Semisoft clustering of single-cell data

2018 ◽  
Vol 116 (2) ◽  
pp. 466-471 ◽  
Author(s):  
Lingxue Zhu ◽  
Jing Lei ◽  
Lambertus Klei ◽  
Bernie Devlin ◽  
Kathryn Roeder

Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of K discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain.

2018 ◽  
Author(s):  
Lingxue Zhu ◽  
Jing Lei ◽  
Bernie Devlin ◽  
Kathryn Roeder

AbstractMotivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semi-soft clustering that can classify both pure and intermediate cell types from data on gene expression or protein abundance from individual cells. Called SOUP, for Semi-sOft clUstering with Pure cells, this novel algorithm reveals the clustering structure for both pure cells, which belong to one single cluster, as well as transitional cells with soft memberships. SOUP involves a two-step process: identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure the K cell types form in a similarity matrix, devised by pairwise comparison of the gene expression profiles of individual cells. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. SOUP is applicable to general clustering problems as well, as long as the unrestrictive modeling assumptions hold. The performance of SOUP is documented via extensive simulation studies. Using SOUP to analyze two single cell data sets from brain shows it produce sensible and interpretable results.


2018 ◽  
Author(s):  
Ken Jean-Baptiste ◽  
José L. McFaline-Figueroa ◽  
Cristina M. Alexandre ◽  
Michael W. Dorrity ◽  
Lauren Saunders ◽  
...  

ABSTRACTSingle-cell RNA-seq can yield high-resolution cell-type-specific expression signatures that reveal new cell types and the developmental trajectories of cell lineages. Here, we apply this approach toA. thalianaroot cells to capture gene expression in 3,121 root cells. We analyze these data with Monocle 3, which orders single cell transcriptomes in an unsupervised manner and uses machine learning to reconstruct single-cell developmental trajectories along pseudotime. We identify hundreds of genes with cell-type-specific expression, with pseudotime analysis of several cell lineages revealing both known and novel genes that are expressed along a developmental trajectory. We identify transcription factor motifs that are enriched in early and late cells, together with the corresponding candidate transcription factors that likely drive the observed expression patterns. We assess and interpret changes in total RNA expression along developmental trajectories and show that trajectory branch points mark developmental decisions. Finally, by applying heat stress to whole seedlings, we address the longstanding question of possible heterogeneity among cell types in the response to an abiotic stress. Although the response of canonical heat shock genes dominates expression across cell types, subtle but significant differences in other genes can be detected among cell types. Taken together, our results demonstrate that single-cell transcriptomics holds promise for studying plant development and plant physiology with unprecedented resolution.


2021 ◽  
Author(s):  
Katherine Rhodes ◽  
Kenneth A Barr ◽  
Joshua M Popp ◽  
Benjamin J Strober ◽  
Alexis Battle ◽  
...  

Most disease-associated loci, though located in putatively regulatory regions, have not yet been confirmed to affect gene expression. One reason for this could be that we have not examined gene expression in the most relevant cell types or conditions. Indeed, even large-scale efforts to study gene expression broadly across tissues are limited by the necessity of obtaining human samples post-mortem, and almost exclusively from adults. Thus, there is an acute need to expand gene regulatory studies in humans to the most relevant cell types, tissues, and states. We propose that embryoid bodies (EBs), which are organoids that contain a multitude of cell types in dynamic states, can provide an answer. Single cell RNA-sequencing now provides a way to interrogate developmental trajectories in EBs and enhance the potential to uncover dynamic regulatory processes that would be missed in studies of static adult tissue. Here, we examined the properties of the EB model for the purpose mapping inter-individual regulatory differences in a large variety of cell types.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


Development ◽  
1994 ◽  
Vol 120 (7) ◽  
pp. 1873-1881 ◽  
Author(s):  
D. Gu ◽  
M.S. Lee ◽  
T. Krahl ◽  
N. Sarvetnick

We examined the spectrum of intermediate cell types in the regenerating pancreas as duct epithelial cells progressed through their differentiation pathway to become mature endocrine cells. The model used was transgenic mice in which the pancreatic islets continue to grow during adulthood, unlike normal mice whose islet cell formation ceases early in life. Because the intermediate cells migrated into islet-like clusters at specific locations, we propose a specific pathway for islet development. Endocrine cells are derived from duct cells co-expressing a duct cell antigen, carbonic anhydrase II (CA II) and an exocrine enzyme, amylase. The CA II/amylase cells become amylase/endocrine intermediate cells as they exited from their lumenal location. The abluminal amylase/endocrine cells continue to differentiate to multihormone-bearing young endocrine cells, which migrated to form clusters with other differentiating endocrine cells.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
John W Wizeman ◽  
Qiuxia Guo ◽  
Elliott M Wilion ◽  
James YH Li

We applied single-cell RNA sequencing to profile genome-wide gene expression in about 9400 individual cerebellar cells from the mouse embryo at embryonic day 13.5. Reiterative clustering identified the major cerebellar cell types and subpopulations of different lineages. Through pseudotemporal ordering to reconstruct developmental trajectories, we identified novel transcriptional programs controlling cell fate specification of populations arising from the ventricular zone and the rhombic lip, two distinct germinal zones of the embryonic cerebellum. Together, our data revealed cell-specific markers for studying the cerebellum, gene-expression cascades underlying cell fate specification, and a number of previously unknown subpopulations that may play an integral role in the formation and function of the cerebellum. Our findings will facilitate new discovery by providing insights into the molecular and cell type diversity in the developing cerebellum.


2018 ◽  
Author(s):  
Yan Wu ◽  
Pablo Tamayo ◽  
Kun Zhang

SummaryHigh throughput single-cell gene expression profiling has enabled the characterization of novel cell types and developmental trajectories. Visualizing these datasets is crucial to biological interpretation, and the most popular method is t-Stochastic Neighbor embedding (t-SNE), which visualizes local patterns better than other methods, but often distorts global structure, such as distances between clusters. We developed Similarity Weighted Nonnegative Embedding (SWNE), which enhances interpretation of datasets by embedding the genes and factors that separate cell states alongside the cells on the visualization, captures local structure better than t-SNE and existing methods, and maintains fidelity when visualizing global structure. SWNE uses nonnegative matrix factorization to decompose the gene expression matrix into biologically relevant factors, embeds the cells, genes and factors in a 2D visualization, and uses a similarity matrix to smooth the embeddings. We demonstrate SWNE on single cell RNA-seq data from hematopoietic progenitors and human brain cells.


Author(s):  
Caihuan Tian ◽  
Qingwei Du ◽  
Mengxue Xu ◽  
Fei Du ◽  
Yuling Jiao

Single cell transcriptomics is revolutionizing our understanding of development and response to environmental cues1–3. Recent advances in single cell RNA sequencing (scRNA-seq) technology have enabled profiling gene expression pattern of heterogenous tissues and organs at single cellular level and have been widely applied in human and animal research4,5. Nevertheless, the existence of cell walls significantly encumbered its application in plant research. Protoplasts have been applied for scRNA-seq analysis, but mostly restricted to tissues amenable for wall digestion, such as root tips6–10. However, many cell types are resistant to protoplasting, and protoplasting may yield ectopic gene expression and bias proportions of cell types. Here we demonstrate a method with minimal artifacts for high-throughput single-nucleus RNA sequencing (snRNA-Seq) that we use to profile tomato shoot apex cells. The obtained high-resolution expression atlas identifies numerous distinct cell types covering major shoot tissues and developmental stages, delineates developmental trajectories of mesophyll cells, vasculature cells, epidermal cells, and trichome cells. In addition, we identify key developmental regulators and reveal their hierarchy. Collectively, this study demonstrates the power of snRNA-seq to plant research and provides an unprecedented spatiotemporal gene expression atlas of heterogeneous shoot cells.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Bastiaan van der Veen ◽  
Sampath K. T. Kapanaiah ◽  
Kasyoka Kilonzo ◽  
Peter Steele-Perkins ◽  
Martin M. Jendryka ◽  
...  

AbstractPathological impulsivity is a debilitating symptom of multiple psychiatric diseases with few effective treatment options. To identify druggable receptors with anti-impulsive action we developed a systematic target discovery approach combining behavioural chemogenetics and gene expression analysis. Spatially restricted inhibition of three subdivisions of the prefrontal cortex of mice revealed that the anterior cingulate cortex (ACC) regulates premature responding, a form of motor impulsivity. Probing three G-protein cascades with designer receptors, we found that the activation of Gi-signalling in layer-5 pyramidal cells (L5-PCs) of the ACC strongly, reproducibly, and selectively decreased challenge-induced impulsivity. Differential gene expression analysis across murine ACC cell-types and 402 GPCRs revealed that - among Gi-coupled receptor-encoding genes - Grm2 is the most selectively expressed in L5-PCs while alternative targets were scarce. Validating our approach, we confirmed that mGluR2 activation reduced premature responding. These results suggest Gi-coupled receptors in ACC L5-PCs as therapeutic targets for impulse control disorders.


Sign in / Sign up

Export Citation Format

Share Document