HiCImpute: A Bayesian Hierarchical Model for Identifying Structural Zeros and Enhancing Single Cell Hi-C Data.

Mapping Intimacies ◽

10.1101/2021.09.01.458575 ◽

2021 ◽

Author(s):

Qing Xie ◽

Chengong Han ◽

Victor Jin ◽

Shili Lin

Keyword(s):

Quality Improvement ◽

Data Quality ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Real Data ◽

Cell Types ◽

Sequencing Depth ◽

2D Data ◽

Structural Zeros

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.

Download Full-text

scHiCSRS: A Self-Representation Smoothing Method with Gaussian Mixture Model for Imputing single-cell Hi-C Data

10.1101/2021.11.09.467824 ◽

2021 ◽

Author(s):

Shili Lin ◽

Qing Xie

Keyword(s):

Single Cell ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Single Cells ◽

High Sensitivity ◽

Gaussian Mixture ◽

Smoothing Method ◽

Intermediate Step ◽

Downstream Analysis ◽

Structural Zeros

Motivation: Single-cell Hi-C techniques make it possible to study cell-to-cell variability in genomic features. However, excess zeros are commonly seen in single-cell Hi-C (scHi-C) data, making scHi-C matrices extremely sparse and bringing extra difficulties in downstream analysis. The observed zeros are a combination of two events: structural zeros for which the loci never inter- act due to underlying biological mechanisms, and dropouts or sampling zeros where the two loci interact but are not captured due to insufficient sequencing depth. Although quality improvement approaches have been proposed as an intermediate step for analyzing scHi-C data, little has been done to address these two types of zeros. We believe that differentiating between structural zeros and dropouts would benefit downstream analysis such as clustering. Results: We propose scHiCSRS, a self-representation smoothing method that improves the data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C 2D data structure into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analysis for three real datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from several comparison methods.

Download Full-text

HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk

10.1101/506717 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jingtian Zhou ◽

Jianzhu Ma ◽

Yusi Chen ◽

Chuankai Cheng ◽

Bokan Bao ◽

...

Keyword(s):

Random Walk ◽

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Single Cells ◽

Genome Structure ◽

Real Data ◽

Cell Types ◽

3D Genome ◽

Cell Clustering

3D genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe HiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real data as benchmarks, HiCluster significantly improves clustering accuracy when applied to low coverage Hi-C datasets compared to existing methods. After imputation by HiCluster, structures similar to topologically associating domains (TADs) could be identified within single cells, and their consensus boundaries among cells were enriched at the TAD boundaries observed in bulk samples. In summary, HiCluster facilitates visualization and comparison of single-cell 3D genomes.

Download Full-text

Multifactorial chromatin regulatory landscapes at single cell resolution

10.1101/2021.07.08.451691 ◽

2021 ◽

Author(s):

Michael P. Meers ◽

Derek H. Janssens ◽

Steven Henikoff

Keyword(s):

Single Cell ◽

Target Identification ◽

Developmental Trajectories ◽

Single Cells ◽

High Sensitivity ◽

Cell Types ◽

Specific Gene ◽

Gene Regulatory ◽

Chromatin Profiling ◽

Regulatory Landscapes

Chromatin profiling at locus resolution uncovers gene regulatory features that define cell types and developmental trajectories, but it remains challenging to map and compare distinct chromatin-associated proteins within the same sample. Here we describe a scalable antibody barcoding approach for profiling multiple chromatin features simultaneously in the same individual cells, Multiple Target Identification by Tagmentation (MulTI-Tag). MulTI-Tag is optimized to retain high sensitivity and specificity of enrichment for multiple chromatin targets in the same assay. We use MulTI-Tag to resolve distinct cell types using multiple chromatin features on a commercial single-cell platform, and to distinguish unique, coordinated patterns of active and repressive element regulatory usage in the same individual cells. Multifactorial profiling allows us to detect novel associations between histone marks in single cells and holds promise for comprehensively characterizing cell-specific gene regulatory landscapes in development and disease.

Download Full-text

scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling

10.1101/2021.02.09.430550 ◽

2021 ◽

Author(s):

Dongyuan Song ◽

Kexin Aileen Li ◽

Zachary Hemminger ◽

Roy Wollman ◽

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Gene Selection ◽

Spatial Information ◽

Dimensional Space ◽

Single Cells ◽

High Sensitivity ◽

Cell Types ◽

Gene Profiling ◽

Selection Methods ◽

Low Dimensional

AbstractSingle-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity, and extra (e.g., spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Here we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and cell-type annotation on targeted gene profiling data.

Download Full-text

Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus

10.1101/2020.04.12.038000 ◽

2020 ◽

Author(s):

Feng Tian ◽

Fan Zhou ◽

Xiang Li ◽

Wenping Ma ◽

Honggui Wu ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Human Cell ◽

Expression Profiles ◽

Single Cells ◽

Cell Types ◽

List Type ◽

Cell Type ◽

Genomic Architecture ◽

Gene Modules

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM

Download Full-text

Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq

eLife ◽

10.7554/elife.63632 ◽

2021 ◽

Vol 10 ◽

Author(s):

Elliott Swanson ◽

Cara Lord ◽

Julian Reading ◽

Alexander T Heubeck ◽

Palak C Genge ◽

...

Keyword(s):

Gene Regulation ◽

Single Cell ◽

Human Peripheral Blood ◽

Single Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Specific Gene ◽

Test Case ◽

Cell Assays ◽

Paired Measurement

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.

Download Full-text

A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification

10.1101/247114 ◽

2018 ◽

Cited By ~ 1

Author(s):

Douglas Abrams ◽

Parveen Kumar ◽

R. Krishna Murthy Karuturi ◽

Joshy George

Keyword(s):

Experimental Design ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Number ◽

Fold Change ◽

Computational Method ◽

Marker Genes ◽

Cell Type ◽

Estimate Sample Size

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.

Download Full-text

Methods and sensors for functional genomic studies of cell-cycle transitions in single cells

Physiological Genomics ◽

10.1152/physiolgenomics.00065.2020 ◽

2020 ◽

Vol 52 (10) ◽

pp. 468-477

Author(s):

Alexander C. Zambon ◽

Tom Hsu ◽

Seunghee Erin Kim ◽

Miranda Klinck ◽

Jennifer Stowe ◽

...

Keyword(s):

Cell Cycle ◽

Single Cell ◽

Single Cell Analysis ◽

Imaging System ◽

Single Cells ◽

Cell Types ◽

Cell Analysis ◽

Mcf7 Cells ◽

Genomic Studies ◽

Cell Subpopulations

Much of our understanding of the regulatory mechanisms governing the cell cycle in mammals has relied heavily on methods that measure the aggregate state of a population of cells. While instrumental in shaping our current understanding of cell proliferation, these approaches mask the genetic signatures of rare subpopulations such as quiescent (G0) and very slowly dividing (SD) cells. Results described in this study and those of others using single-cell analysis reveal that even in clonally derived immortalized cancer cells, ∼1–5% of cells can exhibit G0 and SD phenotypes. Therefore to enable the study of these rare cell phenotypes we established an integrated molecular, computational, and imaging approach to track, isolate, and genetically perturb single cells as they proliferate. A genetically encoded cell-cycle reporter (K67p-FUCCI) was used to track single cells as they traversed the cell cycle. A set of R-scripts were written to quantify K67p-FUCCI over time. To enable the further study G0 and SD phenotypes, we retrofitted a live cell imaging system with a micromanipulator to enable single-cell targeting for functional validation studies. Single-cell analysis revealed HT1080 and MCF7 cells had a doubling time of ∼24 and ∼48 h, respectively, with high duration variability in G1 and G2 phases. Direct single-cell microinjection of mRNA encoding (GFP) achieves detectable GFP fluorescence within ∼5 h in both cell types. These findings coupled with the possibility of targeting several hundreds of single cells improves throughput and sensitivity over conventional methods to study rare cell subpopulations.

Download Full-text

Single-cell RNA cap and tail sequencing (scRCAT-seq) reveals subtype-specific isoforms differing in transcript demarcation

Nature Communications ◽

10.1038/s41467-020-18976-7 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Youjin Hu ◽

Jiawei Zhong ◽

Yuhua Xiao ◽

Zheng Xing ◽

Katherine Sheu ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Single Gene ◽

Cell Types ◽

Machine Learning Algorithms ◽

Translation Efficiency ◽

Transcription Start Sites ◽

Long Read ◽

Mrna Gene ◽

Gene Isoforms

Abstract The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Gene isoforms allow a single gene diverse functions across different cell types, and isoform dynamics allow different functions over time. However, methods to efficiently identify and quantify RNA isoforms genome-wide in single cells are still lacking. Here, we introduce single cell RNA Cap And Tail sequencing (scRCAT-seq), a method to demarcate the boundaries of isoforms based on short-read sequencing, with higher efficiency and lower cost than existing long-read sequencing methods. In conjunction with machine learning algorithms, scRCAT-seq demarcates RNA transcripts with unprecedented accuracy. We identified hundreds of previously uncharacterized transcripts and thousands of alternative transcripts for known genes, revealed cell-type specific isoforms for various cell types across different species, and generated a cell atlas of isoform dynamics during the development of retinal cones.

Download Full-text

Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

10.1101/538553 ◽

2019 ◽

Cited By ~ 3

Author(s):

Arnav Moudgil ◽

Michael N. Wilkinson ◽

Xuhua Chen ◽

June He ◽

Alex J. Cammack ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Binding Sites ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Cell Types ◽

Specific Cell

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.

Download Full-text