scholarly journals tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data

2021 ◽  
Vol 12 ◽  
Author(s):  
Johannes Ostner ◽  
Salomé Carcy ◽  
Christian L. Müller

Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.

2021 ◽  
Author(s):  
Johannes Ostner ◽  
Salomé Carcy ◽  
Christian Lorenz Müller

Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis tascCODA that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA's excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.


2019 ◽  
Author(s):  
Minjie Hu ◽  
Xiaobin Zheng ◽  
Chen-Ming Fan ◽  
Yixian Zheng

AbstractMany hard and soft corals harbor algae for photosynthesis. The algae live inside coral cells in a specialized membrane compartment called symbiosome, which shares the photosynthetically fixed carbon with coral host cells, while host cells provide inorganic carbon for photosynthesis1. This endosymbiotic relationship is critical for corals, but increased environmental stresses are causing corals to expel their endosymbiotic algae, i.e. coral bleaching, leading to coral death and degradation of marine ecosystem2. To date, the molecular pathways that orchestrate algal recognition, uptake, and maintenance in coral cells remain poorly understood. We report chromosome-level genome assembly of a fast-growing soft coral, Xenia species (sp.)3, and its use as a model to decipher the coral-algae endosymbiosis. Single cell RNA-sequencing (scRNA-seq) identified 13 cell types, including gastrodermis and cnidocytes, in Xenia sp. Importantly, we identified the endosymbiotic cell type that expresses a unique set of genes implicated in the recognition, phagocytosis/endocytosis, maintenance of algae, and host coral cell immune modulation. By applying scRNA-seq to investigate algal uptake in our new Xenia sp.. regeneration model, we uncovered a dynamic lineage progression from endosymbiotic progenitor state to mature endosymbiotic and post-endosymbiotic cell states. The evolutionarily conserved genes associated with the endosymbiotic process reported herein open the door to decipher common principles by which different corals uptake and expel their endosymbionts. Our study demonstrates the potential of single cell analyses to examine the similarities and differences of the endosymbiotic lifestyle among different coral species.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
C. L. Müller ◽  
F. J. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.


2020 ◽  
Vol 117 (25) ◽  
pp. 13886-13895 ◽  
Author(s):  
August Yue Huang ◽  
Pengpeng Li ◽  
Rachel E. Rodin ◽  
Sonia N. Kim ◽  
Yanmei Dou ◽  
...  

Elucidating the lineage relationships among different cell types is key to understanding human brain development. Here we developed parallel RNA and DNA analysis after deep sequencing (PRDD-seq), which combines RNA analysis of neuronal cell types with analysis of nested spontaneous DNA somatic mutations as cell lineage markers, identified from joint analysis of single-cell and bulk DNA sequencing by single-cell MosaicHunter (scMH). PRDD-seq enables simultaneous reconstruction of neuronal cell type, cell lineage, and sequential neuronal formation (“birthdate”) in postmortem human cerebral cortex. Analysis of two human brains showed remarkable quantitative details that relate mutation mosaic frequency to clonal patterns, confirming an early divergence of precursors for excitatory and inhibitory neurons, and an “inside-out” layer formation of excitatory neurons as seen in other species. In addition our analysis allows an estimate of excitatory neuron-restricted precursors (about 10) that generate the excitatory neurons within a cortical column. Inhibitory neurons showed complex, subtype-specific patterns of neurogenesis, including some patterns of development conserved relative to mouse, but also some aspects of primate cortical interneuron development not seen in mouse. PRDD-seq can be broadly applied to characterize cell identity and lineage from diverse archival samples with single-cell resolution and in potentially any developmental or disease condition.


2020 ◽  
Author(s):  
August Yue Huang ◽  
Pengpeng Li ◽  
Rachel E. Rodin ◽  
Sonia N. Kim ◽  
Yanmei Dou ◽  
...  

AbstractElucidating the lineage relationships among different cell types is key to understanding human brain development. Here we developed Parallel RNA and DNA analysis after Deep-sequencing (PRDD-seq), which combines RNA analysis of neuronal cell types with analysis of nested spontaneous DNA somatic mutations as cell lineage markers, identified from joint analysis of single cell and bulk DNA sequencing by single-cell MosaicHunter (scMH). PRDD-seq enables the first-ever simultaneous reconstruction of neuronal cell type, cell lineage, and sequential neuronal formation (“birthdate”) in postmortem human cerebral cortex. Analysis of two human brains showed remarkable quantitative details that relate mutation mosaic frequency to clonal patterns, confirming an early divergence of precursors for excitatory and inhibitory neurons, and an “inside-out” layer formation of excitatory neurons as seen in other species. In addition our analysis allows the first estimate of excitatory neuron-restricted precursors (about 10) that generate the excitatory neurons within a cortical column. Inhibitory neurons showed complex, subtype-specific patterns of neurogenesis, including some patterns of development conserved relative to mouse, but also some aspects of primate cortical interneuron development not seen in mouse. PRDD-seq can be broadly applied to characterize cell identity and lineage from diverse archival samples with single-cell resolution and in potentially any developmental or disease condition.Significance StatementStem cells and progenitors undergo a series of cell divisions to generate the neurons of the brain, and understanding this sequence is critical to studying the mechanisms that control cell division and migration in developing brain. Mutations that occur as cells divide are known as the basis of cancer, but have more recently been shown to occur with normal cell divisions, creating a permanent, forensic map of the clonal patterns that define the brain. Here we develop new technology to analyze both DNA mutations and RNA gene expression patterns in single cells from human postmortem brain, allowing us to define clonal patterns among different types of human brain neurons, gaining the first direct insight into how they form.


2020 ◽  
Author(s):  
Zhuoxin Chen ◽  
Chang Ye ◽  
Zhan Liu ◽  
Shanjun Deng ◽  
Xionglei He ◽  
...  

AbstractIt has been challenging to characterize the lineage relationships among cells in vertebrates, which comprise a great number of cells. Fortunately, recent progress has been made by combining the CRISPR barcoding system with single-cell sequencing technologies to provide an unprecedented opportunity to track lineage at single-cell resolution. However, due to errors and/or dropouts introduced by amplification and sequencing, reconstruction of accurate lineage relationships in complex organisms remains a challenge. Thus, improvements in both experimental design and computational analysis are necessary for lineage inference. In this study, we employed single-cell Lineage tracing On Endogenous Scarring Sites (scLOESS), a lineage recording strategy based on the CRISPR-Cas9 system, to trace cell fate commitments for zebrafish larvae. With rigorous quality control, we demonstrated that lineage commitments of complex organisms could be inferred from a limited number of barcoding sites. Together with cell-type characterization, our method could homogenously recover lineage information. In combination with the cell-type and lineage information, we depicted the development histories for germ layers as well as cell types. Furthermore, when combined with trajectory analysis, our methods could capture and resolve the ongoing lineage commitment events to gain further biological insights into later development and differentiation in complex organisms.


2020 ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
CL. Müller ◽  
FJ. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance and identified experimentally verified cell type changes that were missed in original analyses.


2018 ◽  
Author(s):  
Jun Ding ◽  
Chieh Lin ◽  
Ziv Bar-Joseph

Several recent studies focus on the inference of developmental and response trajectories from single cell NA-Seq (scRNA-Seq) data. A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task. Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations. However, both approaches suffer from drawbacks that limit their use. Here we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.


Biomedicines ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 368
Author(s):  
Shi-Xun Ma ◽  
Su Bin Lim

Single-cell and single-nucleus RNA sequencing (sc/snRNA-seq) technologies have enhanced the understanding of the molecular pathogenesis of neurodegenerative disorders, including Parkinson’s disease (PD). Nonetheless, their application in PD has been limited due mainly to the technical challenges resulting from the scarcity of postmortem brain tissue and low quality associated with RNA degradation. Despite such challenges, recent advances in animals and human in vitro models that recapitulate features of PD along with sequencing assays have fueled studies aiming to obtain an unbiased and global view of cellular composition and phenotype of PD at the single-cell resolution. Here, we reviewed recent sc/snRNA-seq efforts that have successfully characterized diverse cell-type populations and identified cell type-specific disease associations in PD. We also examined how these studies have employed computational and analytical tools to analyze and interpret the rich information derived from sc/snRNA-seq. Finally, we highlighted important limitations and emerging technologies for addressing key technical challenges currently limiting the integration of new findings into clinical practice.


Sign in / Sign up

Export Citation Format

Share Document