scholarly journals Hybrid Clustering of Single-Cell Gene Expression and Spatial Information via Integrated NMF and K-Means

2021 ◽  
Vol 12 ◽  
Author(s):  
Sooyoun Oh ◽  
Haesun Park ◽  
Xiuwei Zhang

Advances in single cell transcriptomics have allowed us to study the identity of single cells. This has led to the discovery of new cell types and high resolution tissue maps of them. Technologies that measure multiple modalities of such data add more detail, but they also complicate data integration. We offer an integrated analysis of the spatial location and gene expression profiles of cells to determine their identity. We propose scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by combining sparse nonnegative matrix factorization (sparse NMF) with k-means clustering to cluster high-dimensional gene expression and low-dimensional location data. We show that, under multiple scenarios, including the cases where there is a small number of genes profiled and the location data is noisy, scHybridNMF outperforms sparse NMF, k-means, and an existing method that uses a hidden Markov random field to encode cell location and gene expression data for cell type identification.

2020 ◽  
Author(s):  
Sooyoun Oh ◽  
Haesun Park ◽  
Xiuwei Zhang

AbstractRecent advances in single cell transcriptomics have allowed us to examine the identify of each single cell, thus have led to discovery of new cell types and provide a high resolution map of cell type composition in tissues. Technologies which can measure another type of data of a single cell in addition to the gene-expression data provide a more comprehensive picture of a cell, and meanwhile pose challenges for data integration tasks. We consider the spatial location of cells, which is an important feature of cells, combined with the cells’ gene-expression profiles, to determine the cell type identity. We aim to jointly classify cells based on their locations relative to other cells in the system as well as their gene expression profiles. We have developed scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by incorporating single cell gene expression data with cell location data. We combined two classical methods, nonnegative matrix factorization with a k-means clustering scheme, to respectively represent high-dimensional gene expression data and low-dimensional location data together. Our method incorporates a novel cell location term to the gene expression clustering. We show that scHybridNMF can make use of the location data to improve cell type clustering. In particular, we show that under multiple scenarios, including that when the number of genes profiled is low, and when the location data is noisy, scHybridNMF outperforms the standalone algorithms NMF and k-means, and an existing method HMRF which also uses cell location and gene-expression data for cell type identification.


2020 ◽  
Author(s):  
Feng Tian ◽  
Fan Zhou ◽  
Xiang Li ◽  
Wenping Ma ◽  
Honggui Wu ◽  
...  

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM


2021 ◽  
Vol 9 (Suppl 1) ◽  
pp. A12.1-A12
Author(s):  
Y Arjmand Abbassi ◽  
N Fang ◽  
W Zhu ◽  
Y Zhou ◽  
Y Chen ◽  
...  

Recent advances of high-throughput single cell sequencing technologies have greatly improved our understanding of the complex biological systems. Heterogeneous samples such as tumor tissues commonly harbor cancer cell-specific genetic variants and gene expression profiles, both of which have been shown to be related to the mechanisms of disease development, progression, and responses to treatment. Furthermore, stromal and immune cells within tumor microenvironment interact with cancer cells to play important roles in tumor responses to systematic therapy such as immunotherapy or cell therapy. However, most current high-throughput single cell sequencing methods detect only gene expression levels or epigenetics events such as chromatin conformation. The information on important genetic variants including mutation or fusion is not captured. To better understand the mechanisms of tumor responses to systematic therapy, it is essential to decipher the connection between genotype and gene expression patterns of both tumor cells and cells in the tumor microenvironment. We developed FocuSCOPE, a high-throughput multi-omics sequencing solution that can detect both genetic variants and transcriptome from same single cells. FocuSCOPE has been used to successfully perform single cell analysis of both gene expression profiles and point mutations, fusion genes, or intracellular viral sequences from thousands of cells simultaneously, delivering comprehensive insights of tumor and immune cells in tumor microenvironment at single cell resolution.Disclosure InformationY. Arjmand Abbassi: None. N. Fang: None. W. Zhu: None. Y. Zhou: None. Y. Chen: None. U. Deutsch: None.


2019 ◽  
Author(s):  
Daiwei Tang ◽  
Seyoung Park ◽  
Hongyu Zhao

Abstract Motivation A number of computational methods have been proposed recently to profile tumor microenvironment (TME) from bulk RNA data, and they have proved useful for understanding microenvironment differences among therapeutic response groups. However, these methods are not able to account for tumor proportion nor variable mRNA levels across cell types. Results In this article, we propose a Nonnegative Matrix Factorization-based Immune-TUmor MIcroenvironment Deconvolution (NITUMID) framework for TME profiling that addresses these limitations. It is designed to provide robust estimates of tumor and immune cells proportions simultaneously, while accommodating mRNA level differences across cell types. Through comprehensive simulations and real data analyses, we demonstrate that NITUMID not only can accurately estimate tumor fractions and cell types’ mRNA levels, which are currently unavailable in other methods; it also outperforms most existing deconvolution methods in regular cell type profiling accuracy. Moreover, we show that NITUMID can more effectively detect clinical and prognostic signals from gene expression profiles in tumor than other methods. Availability and implementation The algorithm is implemented in R. The source code can be downloaded at https://github.com/tdw1221/NITUMID. Supplementary information Supplementary data are available at Bioinformatics online.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Dylan Kotliar ◽  
Adrian Veres ◽  
M Aurel Nagy ◽  
Shervin Tabrizi ◽  
Eran Hodis ◽  
...  

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.


2019 ◽  
Author(s):  
Arnav Moudgil ◽  
Michael N. Wilkinson ◽  
Xuhua Chen ◽  
June He ◽  
Alex J. Cammack ◽  
...  

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jing Wu ◽  
Bin Chen ◽  
Tao Han

Nonnegative matrix factorization (NMF) is a popular method for the multivariate analysis of nonnegative data. It involves decomposing a data matrix into a product of two factor matrices with all entries restricted to being nonnegative. Orthogonal nonnegative matrix factorization (ONMF) has been introduced recently. This method has demonstrated remarkable performance in clustering tasks, such as gene expression classification. In this study, we introduce two convergence methods for solving ONMF. First, we design a convergent orthogonal algorithm based on the Lagrange multiplier method. Second, we propose an approach that is based on the alternating direction method. Finally, we demonstrate that the two proposed approaches tend to deliver higher-quality solutions and perform better in clustering tasks compared with a state-of-the-art ONMF.


Sign in / Sign up

Export Citation Format

Share Document