scholarly journals scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling

2021 ◽  
Author(s):  
Dongyuan Song ◽  
Kexin Aileen Li ◽  
Zachary Hemminger ◽  
Roy Wollman ◽  
Jingyi Jessica Li

AbstractSingle-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity, and extra (e.g., spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Here we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and cell-type annotation on targeted gene profiling data.

2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Livnat Jerby-Arnon ◽  
Aviv Regev

ABSTRACTTissue homeostasis relies on orchestrated multicellular circuits, where interactions between different cell types dynamically balance tissue function. While single-cell genomics identifies tissues’ cellular components, deciphering their coordinated action remains a major challenge. Here, we tackle this problem through a new framework of multicellular programs: combinations of distinct cellular programs in different cell types that are coordinated together in the tissue, thus forming a higher order functional unit at the tissue, rather than only cell, level. We develop the open-access DIALOGUE algorithm to systematically uncover such multi-cellular programs not only from spatial data, but even from tissue dissociated and profiled as single cells, e.g., by single-cell RNA-Seq. Tested on spatial transcriptomes from the mouse hypothalamus, DIALOGUE recovered spatial information, predicted the properties of a cell’s environment only based on its transcriptome, and identified multicellular programs that mark animal behavior. Applied to brain samples and colon biopsies profiled by scRNA-Seq, DIALOGUE identified multicellular configurations that mark Alzheimer’s disease and ulcerative colitis (UC), including a program spanning five cell types that is predictive of response to anti-TNF therapy in UC patients and enriched for UC risk genes from GWAS, each acting in different cell types, but all cells acting in concert. Taken together, our study provides a novel conceptual and methodological framework to unravel multicellular regulation in health and disease.


2021 ◽  
Author(s):  
Nicholas Navin ◽  
Runmin Wei ◽  
Siyuan He ◽  
Shanshan Bai ◽  
Emi Sei ◽  
...  

Single cell RNA sequencing (scRNA-seq) methods can profile the transcriptomes of single cells but cannot preserve spatial information. Conversely, spatial transcriptomics (ST) assays can profile spatial regions in tissue sections, but do not have single cell genomic resolution. Here, we developed a computational approach called SChart, that combines these two datasets to achieve single cell spatial mapping of cell types, cell states and continuous phenotypes. We applied SChart to reconstruct cellular spatial structures in existing datasets from normal mouse brain and kidney tissues to validate our approach. We also performed scRNA-seq and ST experiments on two ductal carcinoma in situ (DCIS) tissues and applied SChart to identify subclones that were restricted to different ducts, and specific T cell states adjacent to the tumor areas. Our data shows that SChart can accurately map single cells in diverse tissue types to resolve their spatial organization into cellular neighborhoods and tissue structures.


2021 ◽  
Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.


Genes ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 28
Author(s):  
Shruti Gupta ◽  
Ajay Kumar Verma ◽  
Shandar Ahmad

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.


2018 ◽  
Author(s):  
Peng Xie ◽  
Mingxuan Gao ◽  
Chunming Wang ◽  
Pawan Noel ◽  
Chaoyong Yang ◽  
...  

AbstractCharacterization of individual cell types is fundamental to the study of multicellular samples such as tumor tissues. Single-cell RNAseq techniques, which allow high-throughput expression profiling of individual cells, have significantly advanced our ability of this task. Currently, most of the scRNA-seq data analyses are commenced with unsupervised clustering of cells followed by visualization of clusters in a low-dimensional space. Clusters are often assigned to different cell types based on canonical markers. However, the efficiency of characterizing the known cell types in this way is low and limited by the investigator[s] knowledge. In this study, we present a technical framework of training the expandable supervised-classifier in order to reveal the single-cell identities based on their RNA expression profiles. Using multiple scRNA-seq datasets we demonstrate the superior accuracy, robustness, compatibility and expandability of this new solution compared to the traditional methods. We use two examples of model upgrade to demonstrate how the projected evolution of the cell-type classifier is realized.


2021 ◽  
Author(s):  
Michael P. Meers ◽  
Derek H. Janssens ◽  
Steven Henikoff

Chromatin profiling at locus resolution uncovers gene regulatory features that define cell types and developmental trajectories, but it remains challenging to map and compare distinct chromatin-associated proteins within the same sample. Here we describe a scalable antibody barcoding approach for profiling multiple chromatin features simultaneously in the same individual cells, Multiple Target Identification by Tagmentation (MulTI-Tag). MulTI-Tag is optimized to retain high sensitivity and specificity of enrichment for multiple chromatin targets in the same assay. We use MulTI-Tag to resolve distinct cell types using multiple chromatin features on a commercial single-cell platform, and to distinguish unique, coordinated patterns of active and repressive element regulatory usage in the same individual cells. Multifactorial profiling allows us to detect novel associations between histone marks in single cells and holds promise for comprehensively characterizing cell-specific gene regulatory landscapes in development and disease.


2020 ◽  
Author(s):  
Zhenlan Liang ◽  
Min Li ◽  
Ruiqing Zheng ◽  
Yu Tian ◽  
Xuhua Yan ◽  
...  

AbstractAccurate identification of cell types from single-cell RNA sequencing (scRNA-seq) data plays a critical role in a variety of scRNA-seq analysis studies. It corresponds to solving an unsupervised clustering problem, in which the similarity measurement between cells in a high dimensional space affects the result significantly. Although many approaches have been proposed recently, the accuracy of cell type identification still needs to be improved. In this study, we proposed a novel single-cell clustering framework based on similarity learning, called SSRE. In SSRE, we model the relationships between cells based on subspace assumption and generate a sparse representation of the cell-to-cell similarity, which retains the most similar neighbors for each cell. Besides, we adopt classical pairwise similarities incorporated with a gene selection and enhancement strategy to further improve the effectiveness of SSRE. For performance evaluation, we applied SSRE in clustering, visualization, and other exploratory data analysis processes on various scRNA-seq datasets. Experimental results show that SSRE achieves superior performance in most cases compared to several state-of-the-art methods.


2021 ◽  
Author(s):  
Junil Kim ◽  
Michaela Mrugala Rothová ◽  
Linbu Liao ◽  
Siyeon Rhee ◽  
Guangzheng Weng ◽  
...  

ABSTRACTCells continuously communicate with the neighboring cells during development. Direct interaction of different cell types can induce molecular signals dictating lineage specification and cell fate decisions. The current single cell RNAseq (scRNAseq) technology cannot study cell contact dependent gene expression due to the loss of spatial information. To overcome this issue and determine cell contact specific gene expression during embryogenesis, we performed RNA sequencing of physically interacting cells (PICseq) and assessed alongside our single cell transcriptomes (scRNAseq) derived from developing mouse embryos between embryonic day (E) 7.5 and E9.5. Analysis of PICseq data identifies an interesting suite of gene expression signatures depending on neighboring cell types. For instance, neural progenitor (NP) cells expressed Nkx2-1 when interacting with definitive endoderm (DE) and DE cells expressed Gsc when interacting with NP. Based on the identified cell contact specific genes, we devised a means to predict the neighboring cell types from individual cell transcriptome. We further developed spatial-tSNE to show the pseudo-spatial distribution of cells in a 2-dimensional space. In sum, we suggest an approach to study contact specific gene regulation during embryogenesis.


Sign in / Sign up

Export Citation Format

Share Document