scholarly journals propeller: testing for differences in cell type proportions in single cell data

2021 ◽  
Author(s):  
Belinda Phipson ◽  
Choon Boon Sim ◽  
Enzo R. Porrello ◽  
Alex W Hewitt ◽  
Joseph Powell ◽  
...  

Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. To date, there are more than a thousand software packages that have been developed to analyse scRNA-seq data. These focus predominantly on visualization, dimensionality reduction and cell type identification. Single cell technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments which has not been possible to address with bulk RNA-seq data is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We present propeller, a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. The propeller method is publicly available in the open source speckle R package (https://github.com/Oshlack/speckle).

2016 ◽  
Author(s):  
Benedict Anchang ◽  
Sylvia K. Plevritis

AbstractCell sorting or gating homogenous subpopulations from single-cell data enables cell-type specific characterization, such as cell-type genomic profiling as well as the study of tumor progression. This highlight summarizes recently developed automated gating algorithms that are optimized for both population identification and sorting homogeneous single cells in heterogeneous single-cell data. Data-driven gating strategies identify and/or sort homogeneous subpopulations from a heterogeneous population without relying on expert knowledge thereby removing human bias and variability. We further describe an optimized cell sorting strategy called CCAST based on Clustering, Classification and Sorting Trees which identifies the relevant gating markers, gating hierarchy and partitions that define underlying cell subpopulations. CCAST identifies more homogeneous subpopulations in several applications compared to prior sorting strategies and reveals simultaneous intracellular signaling across different lineage subtypes under different experimental conditions.


2020 ◽  
Author(s):  
Benjamin Chidester ◽  
Tianming Zhou ◽  
Jian Ma

AbstractSpatial transcriptomics technologies promise to reveal spatial relationships of cell-type composition in complex tissues. However, the development of computational methods that capture the unique properties of single-cell spatial transcriptome data to unveil cell identities remains a challenge. Here, we report SpiceMix, a new probabilistic model that enables effective joint analysis of spatial information and gene expression of single cells based on spatial transcriptome data. Both simulation and real data evaluations demonstrate that SpiceMix consistently improves upon the inference of the intrinsic cell types compared with existing approaches. As a proof-of-principle, we use SpiceMix to analyze single-cell spatial transcriptome data of the mouse primary visual cortex acquired by seqFISH+ and STARmap. We find that SpiceMix can improve cell identity assignments and uncover potentially new cell subtypes. SpiceMix is a generalizable framework for analyzing spatial transcriptome data that may provide critical insights into the cell-type composition and spatial organization of cells in complex tissues.


2020 ◽  
Author(s):  
Jinjin Tian ◽  
Jiebiao Wang ◽  
Kathryn Roeder

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]


2021 ◽  
Author(s):  
Yee Voan Teo ◽  
Ashley E Webb ◽  
Nicola Neretti

Compositional and transcriptional changes in the hematopoietic system have been used as biomarkers of immunosenescence and aging. Here, we use single-cell RNA-sequencing to study the aging peripheral blood in mice, and characterize the changes in cell-type composition and transcriptional profiles associated with age. We identified 17 clusters from a total of 14,588 single cells. We detected a general upregulation of antigen processing and presentation and chemokine signaling pathways and a downregulation of genes involved in ribosome pathways with age. We also observed increased percentage of cells expressing markers of senescence, Cdkn1a and Cdkn2a, in old peripheral blood. In addition, we detected a cluster of activated T cells that are exclusively found in old blood, with lower expression of Cd28 and higher expression of Bcl2 and Cdkn2a, suggesting that the cells are senescent and resistant to apoptosis.


2021 ◽  
Author(s):  
Wenjing Ma ◽  
Sumeet Sharma ◽  
Peng Jin ◽  
Shannon L Gourley ◽  
Zhaohui Qin

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.


2018 ◽  
Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.


Author(s):  
Francisco Avila Cobos ◽  
José Alquicira-Hernandez ◽  
Joseph Powell ◽  
Pieter Mestdagh ◽  
Katleen De Preter

AbstractMany computational methods to infer cell type proportions from bulk transcriptomics data have been developed. Attempts comparing these methods revealed that the choice of reference marker signatures is far more important than the method itself. However, a thorough evaluation of the combined impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the results is still lacking.Using different single-cell RNA-sequencing (scRNA-seq) datasets, we generated hundreds of pseudo-bulk mixtures to evaluate the combined impact of these factors on the deconvolution results. Along with methods to perform deconvolution of bulk RNA-seq data we also included five methods specifically designed to infer the cell type composition of bulk data using scRNA-seq data as reference.Both bulk and single-cell deconvolution methods perform best when applied to data in linear scale and the choice of normalization can have a dramatic impact on the performance of some, but not all methods. Overall, single-cell methods have comparable performance to the best performing bulk methods and bulk methods based on semi-supervised approaches showed higher error and lower correlation values between the computed and the expected proportions. Moreover, failure to include cell types in the reference that are present in a mixture always led to substantially worse results, regardless of any of the previous choices. Taken together, we provide a thorough evaluation of the combined impact of the different factors affecting the computational deconvolution task across different datasets and propose general guidelines to maximize its performance.


2019 ◽  
Author(s):  
Evan Appleton ◽  
Noushin Mehdipour ◽  
Tristan Daifuku ◽  
Demarcus Briers ◽  
Iman Haghighi ◽  
...  

AbstractMulti-cellular organisms originate from a single cell, ultimately giving rise to mature organisms of heterogeneous cell type composition in complex structures. Recent work in the areas of stem cell biology and tissue engineering have laid major groundwork in the ability to convert certain types of cells into other types, but there has been limited progress in the ability to control the morphology of cellular masses as they grow. Contemporary approaches to this problem have included the use of artificial scaffolds, 3D bioprinting, and complex media formulations, however, there are no existing approaches to controlling this process purely through genetics and from a single-cell starting point. Here we describe a computer-aided design approach for designing recombinase-based genetic circuits for controlling the formation of multi-cellular masses into arbitrary shapes in human cells.


2019 ◽  
Vol 16 (4) ◽  
pp. 311-314 ◽  
Author(s):  
Yue Deng ◽  
Feng Bao ◽  
Qionghai Dai ◽  
Lani F. Wu ◽  
Steven J. Altschuler

2019 ◽  
Author(s):  
Mahmoud M Ibrahim ◽  
Rafael Kramann

ABSTRACTMarker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers.Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data.genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.


Sign in / Sign up

Export Citation Format

Share Document