scholarly journals Predicting the impact of sequence motifs on gene regulation using single-cell data

2020 ◽  
Author(s):  
Jacob Hepkema ◽  
Nicholas Keone Lee ◽  
Benjamin J. Stewart ◽  
Siwat Ruangroengkulrith ◽  
Varodom Charoensawan ◽  
...  

AbstractBinding of transcription factors (TFs) at proximal promoters and distal enhancers is central to gene regulation. Yet, identification of TF binding sites, also known as regulatory motifs, and quantification of their impact remains challenging. Here we present scover, a convolutional neural network model that can discover putative regulatory motifs along with their cell type-specific importance from single-cell data. Analysis of scRNA-seq data from human kidney shows that ETS, YY1 and NRF1 are the most important motif families for proximal promoters. Using multiple mouse tissues we obtain for the first time a model with cell type resolution which explains 34% of the variance in gene expression. Finally, by applying scover to distal enhancers identified using scATAC-seq from the mouse cerebral cortex we highlight the emergence of layer specific regulatory patterns during development.

2021 ◽  
Author(s):  
Lei Xiong ◽  
Kang Tian ◽  
Yuzhe Li ◽  
Qiangfeng Zhang

Abstract Single-cell RNA-seq and ATAC-seq analyses have been widely applied to decipher cell-type and regulation complexities. However, experimental conditions often confound biological variations when comparing data from different samples. For integrative single-cell data analysis, we have developed SCALEX, a deep generative framework that maps cells into a generalized, batch-invariant cell-embedding space. We demonstrate that SCALEX accurately and efficiently integrates heterogenous single-cell data using multiple benchmarks. It outperforms competing methods, especially for datasets with partial overlaps, accurately aligning similar cell populations while r,etaining true biological differences. We demonstrate the advantages of SCALEX by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19, which were assembled from multiple data sources and can keep growing through the inclusion of new incoming data. Analyses based on these atlases revealed the complex cellular landscapes of human and mouse tissues and identified multiple peripheral immune subtypes associated with COVID-19 disease severity.


2021 ◽  
Author(s):  
Lei Xiong ◽  
Kang Tian ◽  
Yuzhe Li ◽  
Qiangfeng Cliff Zhang

Single-cell RNA-seq and ATAC-seq analyses have been widely applied to decipher cell-type and regulation complexities. However, experimental conditions often confound biological variations when comparing data from different samples. For integrative single-cell data analysis, we have developed SCALEX, a deep generative framework that maps cells into a generalized, batch-invariant cell-embedding space. We demonstrate that SCALEX accurately and efficiently integrates heterogenous single-cell data using multiple benchmarks. It outperforms competing methods, especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We demonstrate the advantages of SCALEX by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19, which were assembled from multiple data sources and can keep growing through the inclusion of new incoming data. Analyses based on these atlases revealed the complex cellular landscapes of human and mouse tissues and identified multiple peripheral immune subtypes associated with COVID-19 disease severity.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Alexander J Tarashansky ◽  
Jacob M Musser ◽  
Margarita Khariton ◽  
Pengyang Li ◽  
Detlev Arendt ◽  
...  

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.


2021 ◽  
Author(s):  
Yakir A Reshef ◽  
Laurie Rumker ◽  
Joyce B Kang ◽  
Aparna Nathan ◽  
Megan B Murray ◽  
...  

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space, termed neighborhoods, that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.


2020 ◽  
Author(s):  
Jinjin Tian ◽  
Jiebiao Wang ◽  
Kathryn Roeder

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]


2020 ◽  
Vol 36 (11) ◽  
pp. 3585-3587
Author(s):  
Lin Wang ◽  
Francisca Catalan ◽  
Karin Shamardani ◽  
Husam Babikir ◽  
Aaron Diaz

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 145-145
Author(s):  
Federico Gaiti ◽  
Allegra Hawkins ◽  
Paulina Chamely ◽  
Ariel Swett ◽  
Xiaoguang Dai ◽  
...  

Abstract Splicing factor mutations are recurrent genetic alterations in blood disorders, highlighting the importance of alternative splicing regulation in hematopoiesis. Specifically, mutations in splicing factor 3B subunit 1 (SF3B1) are implicated in the pathogenesis of myelodysplastic syndromes (MDS) and linked to a high-risk of leukemic transformation in clonal hematopoiesis (CH). SF3B1 mutations are associated with aberrant RNA splicing, leading to increased cryptic 3' splice site (ss) usage and MDS with ring sideroblasts phenotype. The study of mutant SF3B1-driven splicing aberrations in humans has been hampered by the inability to distinguish mutant and wildtype single cells in patient samples and the inadequate coverage of short-read sequencing over splice junctions. To overcome these limitations, we developed GoT-Splice by integrating Genotyping of Transcriptomes (GoT; Nam et al. 2019) with Nanopore long-read single-cell transcriptome profiling and CITE-seq (Fig. A). This allowed for the simultaneous single-cell profiling of protein and gene expression, somatic mutation status, and alternative splicing. Our method selectively enriched full-length sequencing reads with the accurate structure, enabling the capture of higher number of junctions per cell and greater coverage uniformity vs. short-read sequencing (10x Genomics; Fig. B, C). We applied GoT-Splice to CD34+ bone marrow progenitor cells from MDS (n = 15,436 cells across 3 patients; VAF: [0.38-0.4]) to study how SF3B1 mutations corrupt human hematopoiesis (Fig. D). High-resolution mapping of SF3B1 mutvs. SF3B1 wt hematopoietic progenitors revealed an increasing fitness advantage of SF3B1 mut cells towards the megakaryocytic-erythroid lineage, resulting in an expansion of SF3B1 muterythroid progenitor (EP) cells (Fig. E, F). Accordingly, SF3B1 mutEP cells displayed higher protein expression of erythroid lineage markers, CD71 and CD36, vs. SF3B1 wt cells (Fig. G). In these SF3B1 mutEP cells, we identified up-regulation of genes involved in regulation of cell cycle and checkpoint controls (e.g., CCNE1, TP53), and mRNA translation (eIFs gene family; Fig. H). Next, while SF3B1 mut cells showed the expected increase of cryptic 3' splicing vs. SF3B1 wt cells (Fig. I), they exhibited distinct cryptic 3' ss usage as a function of hematopoietic progenitor cell identity, displaying stage-specific aberrant splicing during erythroid maturation (Fig. J). In less differentiated EP cells, we observed mis-splicing of genes involved in iron homeostasis, such as the hypoxia-inducible factor HIF1A, and key regulators of erythroid cell growth, such as SEPT2. At later stages, we observed mis-splicing of BAX, a pro-apoptotic member of the Bcl-2 gene family and transcriptional target of p53, and erythroid-specific genes (e.g., PPOX). We further predicted 54% of the aberrantly spliced mRNAs to introduce premature stop codons, promoting RNA degradation through nonsense-mediated decay (NMD). In line with this notion, we observed a significant decrease in expression of NMD-inducing genes in SF3B1 mut vs . SF3B1 wtEP cells (Fig. K). Lastly, splicing factor mutations observed in CH subjects provide an opportunity to interrogate the downstream impact of SF3B1 mutations prior to development of disease. Like MDS, by applying GoT-splice to CD34+ progenitor cells from SF3B1 mut CH subjects (n = 9,007 cells across 2 subjects; VAF: [0.15-0.22]; Fig. L), we revealed increased mutant cell frequency in EP cells (Fig. M) with concomitant increased expression of genes involved in mRNA translation (Fig. N), consistent with SF3B1 mutation causing mis-splicing injury to translational machinery and ineffective erythropoiesis. Notably, CH patients already exhibited cell-type specific cryptic 3' ss usage in SF3B1 mut cells (Fig. O). In summary, we developed a novel multi-omics single-cell toolkit to examine the impact of splicing factor mutations on cellular fitness directly in human samples. With this approach, we showed that, while SF3B1 mutations arise in uncommitted HSCs, their effect on fitness increases with differentiation into committed EPs, in line with the mutant SF3B1-driven dyserythropoiesis phenotype. We revealed that SF3B1 mutations exert cell-type specific mis-splicing that leads to abnormal erythropoiesis. Finally, we demonstrated that the impact of SF3B1 mutations on EP cells begins before disease onset, as observed in CH subjects. Figure 1 Figure 1. Disclosures Dai: Oxford Nanopore Technologies: Current Employment. Beaulaurier: Oxford Nanopore Technologies: Current Employment. Drong: Oxford Nanopore Technologies: Current Employment. Hickey: Oxford Nanopore Technologies: Current Employment. Juul: Oxford Nanopore Technologies: Current Employment. Wiseman: Astex: Research Funding; Novartis: Consultancy; Bristol Myers Squibb: Consultancy; Takeda: Consultancy; StemLine: Consultancy. Harrington: Oxford Nanopore Technologies: Current Employment. Ghobrial: AbbVie, Adaptive, Aptitude Health, BMS, Cellectar, Curio Science, Genetch, Janssen, Janssen Central American and Caribbean, Karyopharm, Medscape, Oncopeptides, Sanofi, Takeda, The Binding Site, GNS, GSK: Consultancy. Abdel-Wahab: H3B Biomedicine: Consultancy, Research Funding; Foundation Medicine Inc: Consultancy; Merck: Consultancy; Prelude Therapeutics: Consultancy; LOXO Oncology: Consultancy, Research Funding; Lilly: Consultancy; AIChemy: Current holder of stock options in a privately-held company, Membership on an entity's Board of Directors or advisory committees; Envisagenics Inc.: Current holder of stock options in a privately-held company, Membership on an entity's Board of Directors or advisory committees.


2020 ◽  
Author(s):  
Xin Shao ◽  
Haihong Yang ◽  
Xiang Zhuang ◽  
Jie Liao ◽  
Yueren Yang ◽  
...  

AbstractAdvances in single-cell RNA sequencing (scRNA-seq) have furthered the simultaneous classification of thousands of cells in a single assay based on transcriptome profiling. In most analysis protocols, single-cell type annotation relies on marker genes or RNA-seq profiles, resulting in poor extrapolation. Here, we introduce scDeepSort (https://github.com/ZJUFanLab/scDeepSort), a reference-free cell-type annotation tool for single-cell transcriptomics that uses a deep learning model with a weighted graph neural network. Using human and mouse scRNA-seq data resources, we demonstrate the feasibility of scDeepSort and its high accuracy in labeling 764,741 cells involving 56 human and 32 mouse tissues. Significantly, scDeepSort outperformed reference-dependent methods in annotating 76 external testing scRNA-seq datasets, including 126,384 cells (85.79%) from ten human tissues and 134,604 cells from 12 mouse tissues (81.30%). scDeepSort accurately revealed cell identities without prior reference knowledge, thus potentially providing new insights into mechanisms underlying biological processes, disease pathogenesis, and disease progression at a single-cell resolution.


2020 ◽  
Author(s):  
Andreas Fønss Møller ◽  
Kedar Nath Natarajan

AbstractRecent single-cell RNA-sequencing atlases have surveyed and identified major cell-types across different mouse tissues. Here, we computationally reconstruct gene regulatory networks from 3 major mouse cell atlases to capture functional regulators critical for cell identity, while accounting for a variety of technical differences including sampled tissues, sequencing depth and author assigned cell-type labels. Extracting the regulatory crosstalk from mouse atlases, we identify and distinguish global regulons active in multiple cell-types from specialised cell-type specific regulons. We demonstrate that regulon activities accurately distinguish individual cell types, despite differences between individual atlases. We generate an integrated network that further uncovers regulon modules with coordinated activities critical for cell-types, and validate modules using available experimental data. Inferring regulatory networks during myeloid differentiation from wildtype and Irf8 KO cells, we uncover functional contribution of Irf8 regulon activity and composition towards monocyte lineage. Our analysis provides an avenue to further extract and integrate the regulatory crosstalk from single-cell expression data.SummaryIntegrated single-cell gene regulatory network from three mouse cell atlases captures global and cell-type specific regulatory modules and crosstalk, important for cellular identity.


Sign in / Sign up

Export Citation Format

Share Document