scholarly journals Hybrid Clustering of single-cell gene-expression and cell spatial information via integrated NMF and k-means

2020 ◽  
Author(s):  
Sooyoun Oh ◽  
Haesun Park ◽  
Xiuwei Zhang

AbstractRecent advances in single cell transcriptomics have allowed us to examine the identify of each single cell, thus have led to discovery of new cell types and provide a high resolution map of cell type composition in tissues. Technologies which can measure another type of data of a single cell in addition to the gene-expression data provide a more comprehensive picture of a cell, and meanwhile pose challenges for data integration tasks. We consider the spatial location of cells, which is an important feature of cells, combined with the cells’ gene-expression profiles, to determine the cell type identity. We aim to jointly classify cells based on their locations relative to other cells in the system as well as their gene expression profiles. We have developed scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by incorporating single cell gene expression data with cell location data. We combined two classical methods, nonnegative matrix factorization with a k-means clustering scheme, to respectively represent high-dimensional gene expression data and low-dimensional location data together. Our method incorporates a novel cell location term to the gene expression clustering. We show that scHybridNMF can make use of the location data to improve cell type clustering. In particular, we show that under multiple scenarios, including that when the number of genes profiled is low, and when the location data is noisy, scHybridNMF outperforms the standalone algorithms NMF and k-means, and an existing method HMRF which also uses cell location and gene-expression data for cell type identification.

2020 ◽  
Vol 17 (6) ◽  
pp. 621-628 ◽  
Author(s):  
Zhichao Miao ◽  
Pablo Moreno ◽  
Ni Huang ◽  
Irene Papatheodorou ◽  
Alvis Brazma ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia D. van Asten ◽  
Ji Won Oh ◽  
Arantza Farina-Sarasqueta ◽  
Joanne Verheij ◽  
...  

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


2020 ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia van Asten ◽  
Ji-won Oh ◽  
Arantza Fariña-Sarasqueta ◽  
Joanne Verheij ◽  
...  

Abstract High-resolution deconvolution of bulk gene expression profiles is pivotal to characterize the complex cellular make-up of tissues, such as tumor microenvironment. Single-cell RNA-seq provides reliable prior knowledge for deconvolution, however, a comprehensive statistical model is required for efficient utilization due to the inherently variable nature of gene expression. We introduce BLADE (Bayesian Log-normAl Deconvolution), a comprehensive probabilistic framework to estimate both cellular make-up and gene expression profiles of each cell type in each sample. Unlike previous comprehensive statistical approaches, BLADE can handle >20 cell types thanks to the efficient variational inference. Throughout an intensive evaluation using >700 datasets, BLADE showed enhanced robustness against gene expression variability and better completeness than conventional methods, in particular to reconstruct gene expression profiles of each cell type. All-in-all, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems based on standard bulk gene expression data.


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13777 ◽  
Author(s):  
Zheng Chang ◽  
Zhenjia Wang ◽  
Cody Ashby ◽  
Chuan Zhou ◽  
Guojun Li ◽  
...  

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checkerboard patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.


2021 ◽  
Vol 12 ◽  
Author(s):  
Sooyoun Oh ◽  
Haesun Park ◽  
Xiuwei Zhang

Advances in single cell transcriptomics have allowed us to study the identity of single cells. This has led to the discovery of new cell types and high resolution tissue maps of them. Technologies that measure multiple modalities of such data add more detail, but they also complicate data integration. We offer an integrated analysis of the spatial location and gene expression profiles of cells to determine their identity. We propose scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by combining sparse nonnegative matrix factorization (sparse NMF) with k-means clustering to cluster high-dimensional gene expression and low-dimensional location data. We show that, under multiple scenarios, including the cases where there is a small number of genes profiled and the location data is noisy, scHybridNMF outperforms sparse NMF, k-means, and an existing method that uses a hidden Markov random field to encode cell location and gene expression data for cell type identification.


2015 ◽  
Vol 11 (1) ◽  
pp. 86-96 ◽  
Author(s):  
Aakash Chavan Ravindranath ◽  
Nolen Perualila-Tan ◽  
Adetayo Kasim ◽  
Georgios Drakakis ◽  
Sonia Liggi ◽  
...  

Integrating gene expression profiles with certain proteins can improve our understanding of the fundamental mechanisms in protein–ligand binding.


Sign in / Sign up

Export Citation Format

Share Document