scholarly journals Cobolt: integrative analysis of multimodal single-cell sequencing data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractA growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present , a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.

2021 ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractSingle-cell measurements of different cellular features or modalities from cells from the same system allow for a comprehensive understanding of a biological process. While the most common single-cell sequencing technologies require separate input cells for different modalities, there are a growing number of platforms that allow for measuring several modalities on a single cell. We present a novel method, Cobolt, for analyzing such multi-modality single-cell sequencing datasets. Cobolt jointly models the multiple modalities via a novel application of Multimodal Variational Autoencoder (MVAE) to a hierarchical generative model. We first demonstrate its performance on data from the multi-modality platform SNARE-seq, consisting of measurements of gene expression and chromatin accessibility on the same cells. We then illustrate the ability of Cobolt to integrate multi-modality platforms with single-modality platforms by jointly analyzing a SNARE-seq dataset, a single-cell gene expression dataset, and a single-cell chromatin accessibility dataset. We compared Cobolt with current options for analyzing such datasets and show that Cobolt provides robust and flexible results for integration of single-cell data on multiple modalities.


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Giancarlo Bonora ◽  
Vijay Ramani ◽  
Ritambhara Singh ◽  
He Fang ◽  
Dana L. Jackson ◽  
...  

Abstract Background Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. Results Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a “bookmark” mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. Conclusions Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.


2021 ◽  
Vol 15 (Supplement_1) ◽  
pp. S062-S062
Author(s):  
A Lewis ◽  
B Pan-Castillo ◽  
G Berti ◽  
C Felice ◽  
H Gordon ◽  
...  

Abstract Background Histone-deacetylase (HDAC) enzymes are a broad class of ubiquitously expressed enzymes that modulate histone acetylation, chromatin accessibility and gene expression. In models of Inflammatory bowel disease (IBD), HDAC inhibitors, such as Valproic acid (VPA) are proven anti-inflammatory agents and evidence suggests that they also inhibit fibrosis in non-intestinal organs. However, the role of HDAC enzymes in stricturing Crohn’s disease (CD) has not been characterised; this is key to understanding the molecular mechanism and developing novel therapies. Methods To evaluate HDAC expression in the intestine of SCD patients, we performed unbiased single-cell RNA sequencing (sc-RNA-seq) of over 10,000 cells isolated from full-thickness surgical resection specimens of non-SCD (NSCD; n=2) and SCD intestine (n=3). Approximately, 1000 fibroblasts were identified for further analysis, including a distinct cluster of myofibroblasts. Changes in gene expression were compared between myofibroblasts and other resident intestinal fibroblasts using the sc-RNA-seq analysis pipeline in Partek. Changes in HDAC expression and markers of HDAC activity (H3K27ac) were confirmed by immunohistochemistry in FFPE tissue from patient matched NSCD and SCD intestine (n=14 pairs). The function of HDACs in intestinal fibroblasts in the CCD-18co cell line and primary CD myofibroblast cultures (n=16 cultures) was assessed using VPA, a class I HDAC inhibitor. Cells were analysed using a variety of molecular techniques including ATAC-seq, gene expression arrays, qPCR, western blot and immunofluorescent protein analysis. Results Class I HDAC (HDAC1, p= 2.11E-11; HDAC2, p= 4.28E-11; HDAC3, p= 1.60E-07; and HDAC8, p= 2.67E-03) expression was increased in myofibroblasts compared to other intestinal fibroblasts subtypes. IHC also showed an increase in the percentage of stromal HDAC2 positive cells, coupled with a decrease in the percentage of H3K27ac positive cells, in the mucosa overlying SCD intestine relative to matched NSCD areas. In the CCD-18co cell line and primary myofibroblast cultures, VPA reduced chromatin accessibility at Collagen-I gene promoters and suppressed their transcription. VPA also inhibited TGFB-induced up-regulation of Collagen-I, in part by inhibiting TGFB1|1/SMAD4 signalling. TGFB1|1 was identified as a mesenchymal specific target of VPA and siRNA knockdown of TGFB1|1 was sufficient suppress TGFB-induced up-regulation of Collagen-I. Conclusion In SCD patients, class I HDAC expression is increased in myofibroblasts. Class I HDACs inhibitors impair TGFB-signalling and inhibit Collagen-I expression. Selective targeting of TGFB1|1 offers the opportunity to increase treatment specificity by selectively targeting meschenymal cells.


2020 ◽  
Author(s):  
Xiaomei Li ◽  
Lin Liu ◽  
Greg Goodall ◽  
Andreas Schreiber ◽  
Taosheng Xu ◽  
...  

AbstractBreast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels. In this paper, we propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, we use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.Author summaryVarious computational methods have been developed for breast cancer prognosis. However, those methods mainly use the gene expression data generated by the bulk RNA sequencing techniques, which average the expression level of a gene across different cell types. As breast cancer is a heterogenous disease, the bulk gene expression may not be the ideal resource for cancer prognosis. In this study, we propose a novel method to improve breast cancer prognosis using scRNA-seq data. The proposed method has been applied to the EMT scRNA-seq dataset for identifying breast cancer signatures for prognosis. In comparison with existing bulk expression data based methods in breast cancer prognosis, our method shows a better performance. Our single-cell-based signatures provide clues to the relation between EMT and clinical outcomes of breast cancer. In addition, the proposed method can also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


2020 ◽  
Author(s):  
Timothy J. Durham ◽  
Riza M. Daza ◽  
Louis Gevirtzman ◽  
Darren A. Cusanovich ◽  
William Stafford Noble ◽  
...  

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.


2020 ◽  
Author(s):  
Shreya Johri ◽  
Deepali Jain ◽  
Ishaan Gupta

AbstractBesides severe respiratory distress, recent reports in Covid-19 patients have found a strong association between platelet counts and patient survival. Along with hemodynamic changes such as prolonged clotting time, high fibrin degradation products and D-dimers, increased levels of monocytes with disturbed morphology have also been identified. In this study, through an integrated analysis of bulk RNA-sequencing data from Covid-19 patients with data from single-cell sequencing studies on lung tissues, we found that most of the cell-types that contributed to the altered gene expression were of hematopoietic origin. We also found that differentially expressed genes in Covid-19 patients formed a significant pool of the expressing genes in phagocytic cells such as Monocytes and platelets. Interestingly, while we observed a general enrichment for Monocytes in Covid-19 patients, we found that the signal for FCGRA3+ Monocytes was depleted. Further, we found evidence that age-associated gene expression changes in Monocytes and platelets, associated with inflammation, mirror gene expression changes in Covid-19 patients suggesting that pro-inflammatory signalling during aging may worsen the infection in older patients. We identified more than 20 genes that change in the same direction between Covid-19 infection and aging cells that may act as potential therapeutic targets. Of particular interest were IL2RG, GNLY and GMZA expressed in platelets, which facilitates cytokine signalling in Monocytes through an interaction with platelets. To understand whether infection can directly manipulate the biology of Monocytes and platelets, we hypothesize that these non-ACE2 expressing cells may be infected by the virus through the phagocytic route. We observed that phagocytic cells such as Monocytes, T-cells, and platelets have a significantly higher expression of genes that are a part of the Covid-19 viral interactome. Hence these cell-types may have an active rather than a reactive role in viral pathogenesis to manifest clinical symptoms such as coagulopathy. Therefore, our results present molecular evidence for pursuing both anti-inflammatory and anticoagulation therapy for better patient management especially in older patients.


Sign in / Sign up

Export Citation Format

Share Document