scholarly journals ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis

Author(s):  
Jeffrey M. Granja ◽  
M. Ryan Corces ◽  
Sarah E. Pierce ◽  
S. Tansu Bagdatli ◽  
Hani Choudhry ◽  
...  

ABSTRACTThe advent of large-scale single-cell chromatin accessibility profiling has accelerated our ability to map gene regulatory landscapes, but has outpaced the development of robust, scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; www.ArchRProject.com) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses including doublet removal, single-cell clustering and cell type identification, robust peak set generation, cellular trajectory identification, DNA element to gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility, and multi-omic integration with scRNA-seq. Enabling the analysis of over 1.2 million single cells within 8 hours on a standard Unix laptop, ArchR is a comprehensive analytical suite for end-to-end analysis of single-cell chromatin accessibility data that will accelerate the understanding of gene regulation at the resolution of individual cells.

2021 ◽  
Vol 53 (3) ◽  
pp. 403-411 ◽  
Author(s):  
Jeffrey M. Granja ◽  
M. Ryan Corces ◽  
Sarah E. Pierce ◽  
S. Tansu Bagdatli ◽  
Hani Choudhry ◽  
...  

AbstractThe advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.


2018 ◽  
Author(s):  
Nikos Konstantinides ◽  
Katarina Kapuralin ◽  
Chaimaa Fadil ◽  
Luendreo Barboza ◽  
Rahul Satija ◽  
...  

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.


2017 ◽  
Author(s):  
Aparna Bhaduri ◽  
Tomasz J. Nowakowski ◽  
Alex A. Pollen ◽  
Arnold R. Kriegstein

AbstractHigh throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. Efficient generation of such an atlas will depend on sufficient sampling of the diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. To examine the relationship between cell number and transcriptional heterogeneity in the context of unbiased cell type classification, we explicitly explored the population structure of a publically available 1.3 million cell dataset from the E18.5 mouse brain. We propose a computational framework for inferring the saturation point of cluster discovery in a single cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a “complexity index”, which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells (20,000). Together, these findings suggest that most of the biologically interpretable insights from the 1.3 million cells can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high “cellular coverage”, the much anticipated cell atlasing studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage.Recent efforts seek to create a comprehensive cell atlas of the human body1,2 Current technology, however, makes it precipitously expensive to perform analysis of every cell. Therefore, designing effective sampling strategies be critical to generate a working atlas in an efficient, cost-effective, and streamlined manner. The advent of single cell and single nucleus mRNA sequencing (RNAseq) in droplet format3,4 now enables large scale sampling of cells from any tissue, and a recently released publicly available dataset of 1.3 million single cells from the E18.5 mouse brain generated with the 10X Chromium5 provides an opportunity to explore the relationship between population structure and the number of sampled cells necessary to reveal the underlying diversity of cell types. Here, we present a framework for how researchers can evaluate whether a dataset has reached saturation, and we estimate how many cells would be required to generate an atlas of the sample analyzed here. This framework can be applied to any organ or cell type specific atlas for any organism.


2020 ◽  
Author(s):  
Ying Lei ◽  
Mengnan Cheng ◽  
Zihao Li ◽  
Zhenkun Zhuang ◽  
Liang Wu ◽  
...  

Non-human primates (NHP) provide a unique opportunity to study human neurological diseases, yet detailed characterization of the cell types and transcriptional regulatory features in the NHP brain is lacking. We applied a combinatorial indexing assay, sci-ATAC-seq, as well as single-nuclei RNA-seq, to profile chromatin accessibility in 43,793 single cells and transcriptomics in 11,477 cells, respectively, from prefrontal cortex, primary motor cortex and the primary visual cortex of adult cynomolgus monkey Macaca fascularis. Integrative analysis of these two datasets, resolved regulatory elements and transcription factors that specify cell type distinctions, and discovered area-specific diversity in chromatin accessibility and gene expression within excitatory neurons. We also constructed the dynamic landscape of chromatin accessibility and gene expression of oligodendrocyte maturation to characterize adult remyelination. Furthermore, we identified cell type-specific enrichment of differentially spliced gene isoforms and disease-associated single nucleotide polymorphisms. Our datasets permit integrative exploration of complex regulatory dynamics in macaque brain tissue at single-cell resolution.


2021 ◽  
Author(s):  
Michael P. Meers ◽  
Derek H. Janssens ◽  
Steven Henikoff

Chromatin profiling at locus resolution uncovers gene regulatory features that define cell types and developmental trajectories, but it remains challenging to map and compare distinct chromatin-associated proteins within the same sample. Here we describe a scalable antibody barcoding approach for profiling multiple chromatin features simultaneously in the same individual cells, Multiple Target Identification by Tagmentation (MulTI-Tag). MulTI-Tag is optimized to retain high sensitivity and specificity of enrichment for multiple chromatin targets in the same assay. We use MulTI-Tag to resolve distinct cell types using multiple chromatin features on a commercial single-cell platform, and to distinguish unique, coordinated patterns of active and repressive element regulatory usage in the same individual cells. Multifactorial profiling allows us to detect novel associations between histone marks in single cells and holds promise for comprehensively characterizing cell-specific gene regulatory landscapes in development and disease.


2020 ◽  
Vol 36 (12) ◽  
pp. 3825-3832
Author(s):  
Wenming Wu ◽  
Xiaoke Ma

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) profiles transcriptome of individual cells, which enables the discovery of cell types or subtypes by using unsupervised clustering. Current algorithms perform dimension reduction before cell clustering because of noises, high-dimensionality and linear inseparability of scRNA-seq data. However, independence of dimension reduction and clustering fails to fully characterize patterns in data, resulting in an undesirable performance. Results In this study, we propose a flexible and accurate algorithm for scRNA-seq data by jointly learning dimension reduction and cell clustering (aka DRjCC), where dimension reduction is performed by projected matrix decomposition and cell type clustering by non-negative matrix factorization. We first formulate joint learning of dimension reduction and cell clustering into a constrained optimization problem and then derive the optimization rules. The advantage of DRjCC is that feature selection in dimension reduction is guided by cell clustering, significantly improving the performance of cell type discovery. Eleven scRNA-seq datasets are adopted to validate the performance of algorithms, where the number of single cells varies from 49 to 68 579 with the number of cell types ranging from 3 to 14. The experimental results demonstrate that DRjCC significantly outperforms 13 state-of-the-art methods in terms of various measurements on cell type clustering (on average 17.44% by improvement). Furthermore, DRjCC is efficient and robust across different scRNA-seq datasets from various tissues. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. Availability and implementation The software is coded using matlab, and is free available for academic https://github.com/xkmaxidian/DRjCC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Noa Liscovitch-Brauer ◽  
Antonino Montalbano ◽  
Jiale Deng ◽  
Alejandro Méndez-Mancilla ◽  
Hans-Hermann Wessels ◽  
...  

AbstractPooled CRISPR screens have been used to identify genes responsible for specific phenotypes and diseases, and, more recently, to connect genetic perturbations with multi-dimensional gene expression profiles. Here, we describe a method to link genome-wide chromatin accessibility to genetic perturbations in single cells. This scalable, cost-effective method combines pooled CRISPR perturbations with a single-cell combinatorial indexing assay for transposase-accessible chromatin (CRISPR-sciATAC). Using a human and mouse species-mixing experiment, we show that CRISPR-sciATAC separates single cells with a low doublet rate. Then, in human myelogenous leukemia cells, we apply CRISPR-sciATAC to target 21 chromatin-related genes that are frequently mutated in cancer and 84 subunits and cofactors of chromatin remodeling complexes, generating chromatin accessibility data for ~30,000 single cells. Using this large-scale atlas, we correlate loss of specific chromatin remodelers with changes in accessibility — globally and at the binding sites of individual transcription factors. For example, we show that loss of the H3K27 methyltransferase EZH2 leads to increased accessibility at heterochromatic regions involved in embryonic development and triggers expression of multiple genes in the HOXA and HOXD clusters. At a subset of regulatory sites, we also analyze dynamic changes in nucleosome spacing upon loss of chromatin remodelers. CRISPR-sciATAC is a high-throughput, low-cost single-cell method that can be applied broadly to study the role of genetic perturbations on chromatin in normal and disease states.


2020 ◽  
Author(s):  
Larisa M. Soto ◽  
Juan P. Bernal-Tamayo ◽  
Robert Lehmann ◽  
Subash Balsamy ◽  
Xabier Martinez-de-Morentin ◽  
...  

AbstractRecent progress in single-cell genomics has generated multiple tools for cell clustering, annotation, and trajectory inference; yet, inferring their associated regulatory mechanisms is unresolved. Here we present scMomentum, a model-based data-driven formulation to predict gene regulatory networks and energy landscapes from single-cell transcriptomic data without requiring temporal or perturbation experiments. scMomentum provides significant advantages over existing methods with respect to computational efficiency, scalability, network structure, and biological application.AvailabilityscMomentum is available as a Python package at https://github.com/larisa-msoto/scMomentum.git


2021 ◽  
Author(s):  
Jonathan Moody ◽  
Tsukasa Kouno ◽  
Akari Suzuki ◽  
Youtaro Shibayama ◽  
Chikashi Terao ◽  
...  

Profiling of cis-regulatory elements (CREs, mostly promoters and enhancers) in single cells allows the interrogation of the cell-type and -state specific contexts of gene regulation and genetic predisposition to diseases. Here we demonstrate single-cell RNA-5′end-sequencing (sc-end5-seq) methods can detect transcribed CREs (tCREs), enabling simultaneous quantification of gene expression and enhancer activities in a single assay with no extra cost. We show enhancer RNAs can be effectively detected using sc-end5-seq methods with either random or oligo(dT) priming. To analyze tCREs in single cells, we developed SCAFE (Single Cell Analysis of Five-prime Ends) to identify genuine tCREs and analyze their activities (https://github.com/chung-lab/scafe). As compared to accessible CRE (aCRE, based on chromatin accessibility), tCREs are more accurate in predicting CRE interactions by co-activity, more sensitive in detecting shifts in alternative promoter usage and more enriched in diseases heritability. Our results highlight additional dimensions within sc-end5-seq data which can be used for interrogating gene regulation and disease heritability.


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Bhupinder Pal ◽  
Yunshun Chen ◽  
Michael J. G. Milevskiy ◽  
François Vaillant ◽  
Lexie Prokopuk ◽  
...  

Abstract Background Heterogeneity within the mouse mammary epithelium and potential lineage relationships have been recently explored by single-cell RNA profiling. To further understand how cellular diversity changes during mammary ontogeny, we profiled single cells from nine different developmental stages spanning late embryogenesis, early postnatal, prepuberty, adult, mid-pregnancy, late-pregnancy, and post-involution, as well as the transcriptomes of micro-dissected terminal end buds (TEBs) and subtending ducts during puberty. Methods The single cell transcriptomes of 132,599 mammary epithelial cells from 9 different developmental stages were determined on the 10x Genomics Chromium platform, and integrative analyses were performed to compare specific time points. Results The mammary rudiment at E18.5 closely aligned with the basal lineage, while prepubertal epithelial cells exhibited lineage segregation but to a less differentiated state than their adult counterparts. Comparison of micro-dissected TEBs versus ducts showed that luminal cells within TEBs harbored intermediate expression profiles. Ductal basal cells exhibited increased chromatin accessibility of luminal genes compared to their TEB counterparts suggesting that lineage-specific chromatin is established within the subtending ducts during puberty. An integrative analysis of five stages spanning the pregnancy cycle revealed distinct stage-specific profiles and the presence of cycling basal, mixed-lineage, and 'late' alveolar intermediates in pregnancy. Moreover, a number of intermediates were uncovered along the basal-luminal progenitor cell axis, suggesting a continuum of alveolar-restricted progenitor states. Conclusions This extended single cell transcriptome atlas of mouse mammary epithelial cells provides the most complete coverage for mammary epithelial cells during morphogenesis to date. Together with chromatin accessibility analysis of TEB structures, it represents a valuable framework for understanding developmental decisions within the mouse mammary gland.


Sign in / Sign up

Export Citation Format

Share Document