SnapATAC: A Comprehensive Analysis Package for Single Cell ATAC-seq

Mapping Intimacies ◽

10.1101/615179 ◽

2019 ◽

Cited By ~ 37

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Expression Patterns ◽

Sampling Technique ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Low Rank ◽

Specific Gene ◽

Open Chromatin ◽

Process Data

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by heterogeneity of the samples. Single cell analysis of transposase-accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volumes of data could pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC can efficiently dissect cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, a sampling technique that generates the low rank embedding for large-scale dataset, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC was applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis revealed ∼370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate transcriptional regulators in each of the cell types.

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Single-cell RNA-seq reveals the transcriptional landscape in ischemic stroke

Journal of Cerebral Blood Flow & Metabolism ◽

10.1177/0271678x211026770 ◽

2021 ◽

pp. 0271678X2110267

Author(s):

Kai Zheng ◽

Lingmin Lin ◽

Wei Jiang ◽

Lin Chen ◽

Xiyue Zhang ◽

...

Keyword(s):

Ischemic Stroke ◽

Single Cell ◽

Expression Patterns ◽

Cellular Heterogeneity ◽

Ependymal Cells ◽

Specific Gene ◽

Specific Cell ◽

Cell Type ◽

Cns Inflammation ◽

Cell Type Specific

Ischemic stroke (IS) is a detrimental neurological disease with limited treatments options. It has been challenging to define the roles of brain cell subsets in IS onset and progression due to cellular heterogeneity in the CNS. Here, we employed single-cell RNA sequencing (scRNA-seq) to comprehensively map the cell populations in the mouse model of MCAO (middle cerebral artery occlusion). We identified 17 principal brain clusters with cell-type specific gene expression patterns as well as specific cell subpopulations and their functions in various pathways. The CNS inflammation triggered upregulation of key cell type-specific genes unpublished before. Notably, microglia displayed a cell differentiation diversity after stroke among its five distinct subtypes. Importantly, we found the potential trajectory branches of the monocytes/macrophage’s subsets. Finally, we also identified distinct subclusters among brain vasculature cells, ependymal cells and other glia cells. Overall, scRNA-seq revealed the precise transcriptional changes during neuroinflammation at the single-cell level, opening up a new field for exploration of the disease mechanisms and drug discovery in stroke based on the cell-subtype specific molecules.

Prediction of single-cell mechanisms for disease progression in hypertrophic remodelling by a trans-omics approach

Scientific Reports ◽

10.1038/s41598-021-86821-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Momoko Hamano ◽

Seitaro Nomura ◽

Midori Iida ◽

Issei Komuro ◽

Yoshihiro Yamanishi

Keyword(s):

Heart Failure ◽

Single Cell ◽

Large Scale ◽

Molecular Mechanisms ◽

Expression Patterns ◽

Marker Genes ◽

Specific Gene ◽

Rna Seq ◽

Aortic Constriction ◽

Multiple Risk Factors

AbstractHeart failure is a heterogeneous disease with multiple risk factors and various pathophysiological types, which makes it difficult to understand the molecular mechanisms involved. In this study, we proposed a trans-omics approach for predicting molecular pathological mechanisms of heart failure and identifying marker genes to distinguish heterogeneous phenotypes, by integrating multiple omics data including single-cell RNA-seq, ChIP-seq, and gene interactome data. We detected a significant increase in the expression level of natriuretic peptide A (Nppa), after stress loading with transverse aortic constriction (TAC), and showed that cardiomyocytes with high Nppa expression displayed specific gene expression patterns. Multiple NADH ubiquinone complex family, which are associated with the mitochondrial electron transport system, were negatively correlated with Nppa expression during the early stages of cardiac hypertrophy. Large-scale ChIP-seq data analysis showed that Nkx2-5 and Gtf2b were transcription factors characteristic of high-Nppa-expressing cardiomyocytes. Nppa expression levels may, therefore, represent a useful diagnostic marker for heart failure.

Abstract 348: Multi-Species Genome-Wide Analyses of the Specification of Individual Cardiac Cell Fates

Circulation Research ◽

10.1161/res.115.suppl_1.348 ◽

2014 ◽

Vol 115 (suppl_1) ◽

Author(s):

Brian Busser ◽

Julian Haimovich ◽

Guokai Chen ◽

Ivan Ovcharenko ◽

Alan Michelson

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Patterns ◽

Embryonic Stem ◽

Regulatory Elements ◽

Cardiac Cells ◽

Specific Gene ◽

Cardiac Cell ◽

Cell Fates ◽

Cell Fate Decisions

There are remarkable molecular and embryological similarities in cardiogenesis between Drosophila and vertebrates. Cells comprising the Drosophila heart can be subdivided into individual identities based on differences in morphology, function and gene expression patterns. Recent studies have shown that differential modifications of histone proteins, in vivo transcription factor (TF) binding, and the presence of particular TF binding motifs can be used as predictive signatures of the enhancers that govern cell-specific gene expression. Here we used discriminative training methods within an integrative, multi-species framework to uncover the motifs, enhancers and genes underlying cardiac cell fate decisions. As an initial step, we undertook a large-scale validation of Drosophila heart enhancers, which revealed enhancer activities in distinct subpopulations of cardiac cells. To identify related cell-specific regulatory elements, we used the validated enhancers as a training set in a machine learning approach that integrated TF motifs with ChIP data for both TF binding and histone modifications. Empirical validation of candidate enhancers predicted by this method confirmed activity in the appropriate cardiac cells. By clustering the motifs derived from the individual cardiac classifiers, we identified and validated sequence features which discriminate specific cellular identities. Next, we asked if similar predictive signatures underlie mouse and human cardiomyocyte (CM) differentiation from embryonic stem cells (ESCs). We show that the distribution of histone marks found within differentiating human and mouse ESCs indeed predict genes potentially critical for CM differentiation, with the best predictions provided by the overlapping mouse and human candidates. We evaluated this result in a large-scale RNAi-based screen of Drosophila orthologs of the mammalian genes, which uncovered dozens of novel cardiogenic regulators whose function is being tested in differentiating human ESCs. In total, these results document the utility of computational modeling combined with empirical testing to uncover the enhancers, TF motifs and genes which characterize individual cardiac cell fates in both invertebrate and mammalian species.

Classifying cells with Scasat - a tool to analyse single-cell ATAC-seq

10.1101/227397 ◽

2017 ◽

Cited By ~ 1

Author(s):

Syed Murtuza Baker ◽

Connor Rogerson ◽

Andrew Hayes ◽

Andrew D. Sharrocks ◽

Magnus Rattray

Keyword(s):

Single Cell ◽

Dna Sequences ◽

Mammalian Cells ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Open Chromatin ◽

Analysis Tool ◽

Link Type

AbstractMotivationThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals the landscape and principles of DNA regulatory mechanisms by identifying the accessible genome of mammalian cells. When done at single-cell resolution, it provides an insight into the cell-to-cell variability that emerges from identical DNA sequences by identifying the variability in the genomic location of open chromatin sites in each of the cells. Processing of single-cell ATAC-seq requires a number of steps and a simple pipeline to processes and analyse single-cell ATAC-seq is not yet available.ResultsThis paper presents ScAsAT (single-cell ATAC-seq analysis tool), a complete pipeline to process scATAC-seq data with simple steps. The pipeline is developed in a Jupyter notebook environment that holds the executable code along with the necessary description and results. For the initial sequence processing steps, the pipeline uses a number of well-known tools which it executes from a python environment for each of the fastq files. While functions for the data analysis part are mostly written in R, it is robust, flexible, interactive and easy to extend. The pipeline was applied to a single-cell ATAC-seq dataset in order to identify different cell-types from a complex cell mixture. The results from Scasat showed that open chromatin location corresponding to potential regulatory elements can account for cellular heterogeneity and can identify regulatory regions that separates cells from a complex population.AvailabilityThe jupyter notebook with the complete pipeline applied to the dataset published with this paper are publicly available on the Github (https://github.com/ManchesterBioinference/Scasat). An additional notebook is also provided for analysis of a publicly available dataset. The fastq files are submitted at ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number [email protected] and [email protected] informationSupplementary data are available at bioRxiv online.

A Global Vista of the Epigenomic State of the Mouse Submandibular Gland

Journal of Dental Research ◽

10.1177/00220345211012000 ◽

2021 ◽

pp. 002203452110120

Author(s):

C. Gluck ◽

S. Min ◽

A. Oyelakin ◽

M. Che ◽

E. Horeth ◽

...

Keyword(s):

Gene Expression ◽

Transcriptional Control ◽

Expression Patterns ◽

Global Gene Expression ◽

Regulatory Elements ◽

Specific Gene ◽

Submandibular Salivary Gland ◽

Data Sets ◽

Control Mechanisms ◽

Chromatin Immunoprecipitation Sequencing

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Cellular heterogeneity and lineage restriction during mouse digit tip regeneration at single cell resolution

10.1101/737023 ◽

2019 ◽

Author(s):

Gemma L. Johnson ◽

Erick J. Masias ◽

Jessica A. Lehoczky

Keyword(s):

Single Cell ◽

Limb Regeneration ◽

Cell Types ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Genetic Lineage ◽

Epimorphic Regeneration ◽

Related Cell ◽

Lower Vertebrates ◽

Single Cell Rna Sequencing

ABSTRACTInnate regeneration following digit tip amputation is one of the few examples of epimorphic regeneration in mammals. Digit tip regeneration is mediated by the blastema, the same structure invoked during limb regeneration in some lower vertebrates. By genetic lineage analyses in mice, the digit tip blastema has been defined as a population of heterogeneous, lineage restricted progenitor cells. These previous studies, however, do not comprehensively evaluate blastema heterogeneity or address lineage restriction of closely related cell types. In this report we present single cell RNA sequencing of over 38,000 cells from mouse digit tip blastemas and unamputated control digit tips and generate an atlas of the cell types participating in digit tip regeneration. We define the differentiation trajectories of vascular, monocytic, and fibroblastic lineages over regeneration, and while our data confirm broad lineage restriction of progenitors, our analysis reveals an early blastema fibroblast population expressing a novel regeneration-specific gene, Mest.

Transcriptome Profiling of Adipose Tissue Reveals Depot-Specific Metabolic Alterations Among Patients with Colorectal Cancer

The Journal of Clinical Endocrinology & Metabolism ◽

10.1210/jc.2019-00461 ◽

2019 ◽

Vol 104 (11) ◽

pp. 5225-5237 ◽

Cited By ~ 3

Author(s):

Mariam Haffa ◽

Andreana N Holowatyj ◽

Mario Kratz ◽

Reka Toth ◽

Axel Benner ◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Adipose Tissue ◽

Large Scale ◽

Expression Profiles ◽

Expression Patterns ◽

Human Study ◽

Subcutaneous Fat ◽

Specific Gene ◽

Adipose Tissue Depots

Abstract Context Adipose tissue inflammation and dysregulated energy homeostasis are key mechanisms linking obesity and cancer. Distinct adipose tissue depots strongly differ in their metabolic profiles; however, comprehensive studies of depot-specific perturbations among patients with cancer are lacking. Objective We compared transcriptome profiles of visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) from patients with colorectal cancer and assessed the associations of different anthropometric measures with depot-specific gene expression. Design Whole transcriptomes of VAT and SAT were measured in 233 patients from the ColoCare Study, and visceral and subcutaneous fat area were quantified via CT. Results VAT compared with SAT showed elevated gene expression of cytokines, cell adhesion molecules, and key regulators of metabolic homeostasis. Increased fat area was associated with downregulated lipid and small molecule metabolism and upregulated inflammatory pathways in both compartments. Comparing these patterns between depots proved specific and more pronounced gene expression alterations in SAT and identified unique associations of integrins and lipid metabolism–related enzymes. VAT gene expression patterns that were associated with visceral fat area poorly overlapped with patterns associated with self-reported body mass index (BMI). However, subcutaneous fat area and BMI showed similar associations with SAT gene expression. Conclusions This large-scale human study demonstrates pronounced disparities between distinct adipose tissue depots and reveals that BMI poorly correlates with fat mass–associated changes in VAT. Taken together, these results provide crucial evidence for the necessity to differentiate between distinct adipose tissue depots for a correct characterization of gene expression profiles that may affect metabolic health of patients with colorectal cancer.

rCASC: reproducible classification analysis of single-cell sequencing data

GigaScience ◽

10.1093/gigascience/giz105 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 5

Author(s):

Luca Alessandrì ◽

Francesca Cordero ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cellular Heterogeneity ◽

Cell Subpopulation ◽

Integrated Analysis ◽

Specific Gene ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Cell Stability ◽

Reproducible Analysis

Abstract Background Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. Findings rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. Conclusions rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.