Comprehensive integration of single cell data

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

Download Full-text

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

Combined aptamer and transcriptome sequencing of single cells

10.1101/228338 ◽

2017 ◽

Cited By ~ 2

Author(s):

Cyrille L. Delley ◽

Leqian Liu ◽

Maen F. Sarhan ◽

Adam R. Abate

Keyword(s):

Single Cell ◽

Single Cells ◽

Expression Patterns ◽

Cell Types ◽

Single Cell Protein ◽

Protein Characterization ◽

Cell Protein ◽

Surface Binding ◽

Distinct Cell

AbstractThe transcriptome and proteome encode distinct information that is important for characterizing heterogeneous biological systems. We demonstrate a method to simultaneously characterize the transcriptomes and proteomes of single cells at high throughput using aptamer probes and droplet-based single cell sequencing. With our method, we differentiate distinct cell types based on aptamer surface binding and gene expression patterns. Aptamers provide advantages over antibodies for single cell protein characterization, including rapid, in vitro, and high-purity generation via SELEX, and the ability to amplify and detect them with PCR and sequencing.

Download Full-text

scAMACE: Model-based approach to the joint analysis of single-cell data on chromatin accessibility, gene expression and methylation

10.1101/2021.03.29.437485 ◽

2021 ◽

Author(s):

Jiaxuan Wangwu ◽

Zexuan Sun ◽

Zhixiang Lin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Chromatin Accessibility ◽

Integrative Analysis ◽

Joint Analysis ◽

Data Types ◽

Link Type ◽

Complex Biological Process ◽

Cell Data

AbstractThe advancement in technologies and the growth of available single-cell datasets motivate integrative analysis of multiple single-cell genomic datasets. Integrative analysis of multimodal single-cell datasets combines complementary information offered by single-omic datasets and can offer deeper insights on complex biological process. Clustering methods that identify the unknown cell types are among the first few steps in the analysis of single-cell datasets, and they are important for downstream analysis built upon the identified cell types. We propose scAMACE for the integrative analysis and clustering of single-cell data on chromatin accessibility, gene expression and methylation. We demonstrate that cell types are better identified and characterized through analyzing the three data types jointly. We develop an efficient expectation-maximization (EM) algorithm to perform statistical inference, and evaluate our methods on both simulation study and real data applications. We also provide the GPU implementation of scAMACE, making it scalable to large datasets. The software and datasets are available at https://github.com/cuhklinlab/scAMACE_py (pythom implementation) and https://github.com/cuhklinlab/scAMACE (R implementation).

Download Full-text

A single cell brain atlas in human Alzheimer’s disease

10.1101/628347 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alexandra Grubman ◽

Gabriel Chew ◽

John F. Ouyang ◽

Guizhi Sun ◽

Xin Yi Choo ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Cell Fate ◽

Expression Patterns ◽

Cell Types ◽

Gene Expression Patterns ◽

Cell Type ◽

Web Resource ◽

Cell Type Specific

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com

Download Full-text

Semi-soft Clustering of Single Cell Data

10.1101/285056 ◽

2018 ◽

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Pairwise Comparison ◽

Cell Types ◽

Intermediate Cell ◽

Soft Clustering ◽

Membership Matrix ◽

Cell Data

AbstractMotivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semi-soft clustering that can classify both pure and intermediate cell types from data on gene expression or protein abundance from individual cells. Called SOUP, for Semi-sOft clUstering with Pure cells, this novel algorithm reveals the clustering structure for both pure cells, which belong to one single cluster, as well as transitional cells with soft memberships. SOUP involves a two-step process: identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure the K cell types form in a similarity matrix, devised by pairwise comparison of the gene expression profiles of individual cells. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. SOUP is applicable to general clustering problems as well, as long as the unrestrictive modeling assumptions hold. The performance of SOUP is documented via extensive simulation studies. Using SOUP to analyze two single cell data sets from brain shows it produce sensible and interpretable results.

Download Full-text

Functional Inference of Gene Regulation using Single-Cell Multi-Omics

10.1101/2021.07.28.453784 ◽

2021 ◽

Author(s):

Vinay K Kartha ◽

Fabiana M Duarte ◽

Yan Hu ◽

Sai Ma ◽

Jennifer G Chew ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Regulatory Networks ◽

Immunological Response ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Human Blood Cells ◽

Functional Inference ◽

Omic Data

Cells require coordinated control over gene expression when responding to environmental stimuli. Here, we apply scATAC-seq and scRNA-seq in resting and stimulated human blood cells. Collectively, we generate ~91,000 single-cell profiles, allowing us to probe the cis -regulatory landscape of immunological response across cell types, stimuli and time. Advancing tools to integrate multi-omic data, we develop FigR - a framework to computationally pair scATAC-seq with scRNA-seq cells, connect distal cis -regulatory elements to genes, and infer gene regulatory networks (GRNs) to identify candidate TF regulators. Utilizing these paired multi-omic data, we define Domains of Regulatory Chromatin (DORCs) of immune stimulation and find that cells alter chromatin accessibility prior to production of gene expression at time scales of minutes. Further, the construction of the stimulation GRN elucidates TF activity at disease-associated DORCs. Overall, FigR enables the elucidation of regulatory interactions across single-cell data, providing new opportunities to understand the function of cells within tissues.

Download Full-text

Kidney Single-cell Transcriptomes Predict Spatial Corticomedullary Gene Expression and Tissue Osmolality Gradients

Journal of the American Society of Nephrology ◽

10.1681/asn.2020070930 ◽

2020 ◽

pp. ASN.2020070930

Author(s):

Christian Hinze ◽

Nikos Karaiskos ◽

Anastasiya Boltengagen ◽

Katharina Walentin ◽

Klea Redo ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Collecting Duct ◽

Expression Patterns ◽

Cell Types ◽

Spatial Position ◽

Sufficient Information ◽

Additional Information ◽

Spatial Reconstruction ◽

Tissue Osmolality

BackgroundSingle-cell transcriptomes from dissociated tissues provide insights into cell types and their gene expression and may harbor additional information on spatial position and the local microenvironment. The kidney’s cells are embedded into a gradient of increasing tissue osmolality from the cortex to the medulla, which may alter their transcriptomes and provide cues for spatial reconstruction.MethodsSingle-cell or single-nuclei mRNA sequencing of dissociated mouse kidneys and of dissected cortex, outer, and inner medulla, to represent the corticomedullary axis, was performed. Computational approaches predicted the spatial ordering of cells along the corticomedullary axis and quantitated expression levels of osmo-responsive genes. In situ hybridization validated computational predictions of spatial gene-expression patterns. The strategy was used to compare single-cell transcriptomes from wild-type mice to those of mice with a collecting duct–specific knockout of the transcription factor grainyhead-like 2 (Grhl2CD−/−), which display reduced renal medullary osmolality.ResultsSingle-cell transcriptomics from dissociated kidneys provided sufficient information to approximately reconstruct the spatial position of kidney tubule cells and to predict corticomedullary gene expression. Spatial gene expression in the kidney changes gradually and osmo-responsive genes follow the physiologic corticomedullary gradient of tissue osmolality. Single-nuclei transcriptomes from Grhl2CD−/− mice indicated a flattened expression gradient of osmo-responsive genes compared with control mice, consistent with their physiologic phenotype.ConclusionsSingle-cell transcriptomics from dissociated kidneys facilitated the prediction of spatial gene expression along the corticomedullary axis and quantitation of osmotically regulated genes, allowing the prediction of a physiologic phenotype.

Download Full-text

Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression

Statistical Science ◽

10.1214/19-sts714 ◽

2020 ◽

Vol 35 (1) ◽

pp. 2-13 ◽

Cited By ~ 2

Author(s):

Zhixiang Lin ◽

Mahdi Zamanighomi ◽

Timothy Daley ◽

Shining Ma ◽

Wing Hung Wong

Keyword(s):

Gene Expression ◽

Single Cell ◽

Chromatin Accessibility ◽

Joint Analysis ◽

Model Based ◽

Cell Data

Download Full-text

Decision tree models and cell fate choice

10.1101/2020.12.19.423629 ◽

2020 ◽

Author(s):

Ivan Croydon Veleslavov ◽

Michael P.H. Stumpf

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Cell Types ◽

Lineage Tree ◽

Tree Models ◽

Fate Decision ◽

Average Gene ◽

Lineage Trees ◽

Cell Data

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.

Download Full-text