scholarly journals SC2disease: a manually curated database of single-cell transcriptome for human diseases

2020 ◽  
Vol 49 (D1) ◽  
pp. D1413-D1419 ◽  
Author(s):  
Tianyi Zhao ◽  
Shuxuan Lyu ◽  
Guilin Lu ◽  
Liran Juan ◽  
Xi Zeng ◽  
...  

Abstract SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.

Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Background Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. Results We evaluated different normalization methods, quantified the magnitude of variation introduced by different sources, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We applied methods such as random forest regression to a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is of the same magnitude as the biological variation across cell types. Tissue of origin and cell subtype are less important but still substantial factors, while the difference between individuals is relatively small. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample.Conclusions Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


Author(s):  
Meng Zhang ◽  
Stephen W. Eichhorn ◽  
Brian Zingg ◽  
Zizhen Yao ◽  
Hongkui Zeng ◽  
...  

AbstractA mammalian brain is comprised of numerous cell types organized in an intricate manner to form functional neural circuits. Single-cell RNA sequencing provides a powerful approach to identify cell types based on their gene expression profiles and has revealed many distinct cell populations in the brain1-3. Single-cell epigenomic profiling4,5 further provides information on gene-regulatory signatures of different cell types. Understanding how different cell types contribute to brain function, however, requires knowledge of their spatial organization and connectivity, which is not preserved in sequencing-based methods that involve cell dissociation3,6. Here, we used an in situ single-cell transcriptome-imaging method, multiplexed error-robust fluorescence in situ hybridization (MERFISH)7, to generate a molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex (MOp). We profiled ∼300,000 cells in the MOp, identified 95 neuronal and non-neuronal cell clusters, and revealed a complex spatial map in which not only excitatory neuronal clusters but also most inhibitory neuronal clusters adopted layered organizations. Notably, intratelencephalic (IT) cells, the largest branch of neurons in the MOp, formed a continuous spectrum of cells with gradual changes in both gene expression profiles and cortical depth positions in a highly correlated manner. Furthermore, we integrated MERFISH with retrograde tracing to probe the projection targets for different MOp neuronal cell types and found that projections of MOp neurons to other cortical regions formed a many-to-many network with each target region receiving input preferentially from a different composition of IT clusters. Overall, our results provide a high-resolution spatial and projection map of molecularly defined cell types in the MOp. We anticipate that the imaging platform described here can be broadly applied to create high-resolution cell atlases of a wide range of systems.


2020 ◽  
Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of applying cell type profiles derived from blood on mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


2017 ◽  
Author(s):  
Lingxue Zhu ◽  
Jing Lei ◽  
Bernie Devlin ◽  
Kathryn Roeder

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.


2020 ◽  
Vol 7 (5) ◽  
pp. 881-896 ◽  
Author(s):  
Dongxu He ◽  
Aiqin Mao ◽  
Chang-Bo Zheng ◽  
Hao Kan ◽  
Ka Zhang ◽  
...  

Abstract The aorta, with ascending, arch, thoracic and abdominal segments, responds to the heartbeat, senses metabolites and distributes blood to all parts of the body. However, the heterogeneity across aortic segments and how metabolic pathologies change it are not known. Here, a total of 216 612 individual cells from the ascending aorta, aortic arch, and thoracic and abdominal segments of mouse aortas under normal conditions or with high blood glucose levels, high dietary salt, or high fat intake were profiled using single-cell RNA sequencing. We generated a compendium of 10 distinct cell types, mainly endothelial (EC), smooth muscle (SMC), stromal and immune cells. The distributions of the different cells and their intercommunication were influenced by the hemodynamic microenvironment across anatomical segments, and the spatial heterogeneity of ECs and SMCs may contribute to differential vascular dilation and constriction that were measured by wire myography. Importantly, the composition of aortic cells, their gene expression profiles and their regulatory intercellular networks broadly changed in response to high fat/salt/glucose conditions. Notably, the abdominal aorta showed the most dramatic changes in cellular composition, particularly involving ECs, fibroblasts and myeloid cells with cardiovascular risk factor-related regulons and gene expression networks. Our study elucidates the nature and range of aortic cell diversity, with implications for the treatment of metabolic pathologies.


2019 ◽  
Author(s):  
Alexandra Grubman ◽  
Gabriel Chew ◽  
John F. Ouyang ◽  
Guizhi Sun ◽  
Xin Yi Choo ◽  
...  

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com


2019 ◽  
Author(s):  
Arnav Moudgil ◽  
Michael N. Wilkinson ◽  
Xuhua Chen ◽  
June He ◽  
Alex J. Cammack ◽  
...  

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia D. van Asten ◽  
Ji Won Oh ◽  
Arantza Farina-Sarasqueta ◽  
Joanne Verheij ◽  
...  

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


2018 ◽  
Author(s):  
Lingxue Zhu ◽  
Jing Lei ◽  
Bernie Devlin ◽  
Kathryn Roeder

AbstractMotivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semi-soft clustering that can classify both pure and intermediate cell types from data on gene expression or protein abundance from individual cells. Called SOUP, for Semi-sOft clUstering with Pure cells, this novel algorithm reveals the clustering structure for both pure cells, which belong to one single cluster, as well as transitional cells with soft memberships. SOUP involves a two-step process: identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure the K cell types form in a similarity matrix, devised by pairwise comparison of the gene expression profiles of individual cells. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. SOUP is applicable to general clustering problems as well, as long as the unrestrictive modeling assumptions hold. The performance of SOUP is documented via extensive simulation studies. Using SOUP to analyze two single cell data sets from brain shows it produce sensible and interpretable results.


2021 ◽  
Author(s):  
Jianbo Li ◽  
Ligang Wang ◽  
Dawei Yu ◽  
Junfeng Hao ◽  
Longchao Zhang ◽  
...  

Thoracolumbar vertebra (TLV) and rib primordium (RP) development is a common evolutionary feature across vertebrates although whole-organism analysis of TLV and RP gene expression dynamics has been lacking. Here we investigated the single-cell transcriptomic landscape of thoracic vertebra (TV), lumbar vertebra (LV), and RP cells from a pig embryo at 27 days post-fertilization (dpf) and identified six cell types with distinct gene-expression signatures. In-depth dissection of the gene-expression dynamics and RNA velocity revealed a coupled process of osteogenesis and angiogenesis during TLV and rib development. Further analysis of cell-type-specific and strand-specific expression uncovered the extremely high levels of HOXA10 3'-UTR sequence specific to osteoblast of LV cells, which may function as anti-HOXA10-antisense by counteracting the HOXA10-antisense effect to determine TLV transition. Thus, this work provides a valuable resource for understanding embryonic osteogenesis and angiogenesis underlying vertebrate TLV and RP development at the cell-type-specific resolution, which serves as a comprehensive view on the transcriptional profile of animal embryo development.


Sign in / Sign up

Export Citation Format

Share Document