scholarly journals SC-JNMF: Single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization

2020 ◽  
Author(s):  
Mikio Shiga ◽  
Shigeto Seno ◽  
Makoto Onizuka ◽  
Hideo Matsuda

AbstractUnsupervised cell clustering is important in discovering cell diversity and subpopulations. Single-cell clustering using gene expression profiles is known to show different results depending on the method of expression quantification; nevertheless, most single-cell clustering methods do not consider the method.In this article, we propose a robust and highly accurate clustering method using joint non-negative matrix factorization (joint NMF) based on multiple gene expression profiles quantified using different methods. Matrix factorization is an excellent method for dimension reduction and feature extraction of data. In particular, NMF approximates the data matrix as the product of two matrices in which all factors are non-negative. Our joint NMF can extract common factors among multiple gene expression profiles by applying each NMF to them under the constraint that one of the factorized matrices is shared among the multiple NMFs. The joint NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to the conventional clustering methods, which uses only a single quantification method. In conclusion, our study showed that our clustering method using multiple gene expression profiles is more accurate than other popular methods.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12087
Author(s):  
Mikio Shiga ◽  
Shigeto Seno ◽  
Makoto Onizuka ◽  
Hideo Matsuda

Single-cell RNA-sequencing is a rapidly evolving technology that enables us to understand biological processes at unprecedented resolution. Single-cell expression analysis requires a complex data processing pipeline, and the pipeline is divided into two main parts: The quantification part, which converts the sequence information into gene-cell matrix data; the analysis part, which analyzes the matrix data using statistics and/or machine learning techniques. In the analysis part, unsupervised cell clustering plays an important role in identifying cell types and discovering cell diversity and subpopulations. Identified cell clusters are also used for subsequent analysis, such as finding differentially expressed genes and inferring cell trajectories. However, single-cell clustering using gene expression profiles shows different results depending on the quantification methods. Clustering results are greatly affected by the quantification method used in the upstream process. In other words, even if the original RNA-sequence data is the same, gene expression profiles processed by different quantification methods will produce different clusters. In this article, we propose a robust and highly accurate clustering method based on joint non-negative matrix factorization (joint-NMF) by utilizing the information from multiple gene expression profiles quantified using different methods from the same RNA-sequence data. Our joint-NMF can extract common factors among multiple gene expression profiles by applying each NMF under the constraint that one of the factorized matrices is shared among multiple NMFs. The joint-NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to conventional clustering methods, which use only a single gene expression profile. Additionally, we showed the usefulness of discovering marker genes with the extracted features using our method.


2019 ◽  
Vol 9 (24) ◽  
pp. 5552 ◽  
Author(s):  
Gabriella Casalino ◽  
Mauro Coluccia ◽  
Maria L. Pati ◽  
Alessandra Pannunzio ◽  
Angelo Vacca ◽  
...  

Microarray data are a kind of numerical non-negative data used to collect gene expression profiles. Since the number of genes in DNA is huge, they are usually high dimensional, therefore they require dimensionality reduction and clustering techniques to extract useful information. In this paper we use NMF, non-negative matrix factorization, to analyze microarray data, and also develop “intelligent” results visualization with the aim to facilitate the analysis of the domain experts. For this purpose, a case study based on the analysis of the gene expression profiles (GEPs), representative of the human multiple myeloma diseases, was investigated in 40 human myeloma cell lines (HMCLs). The aim of the experiments was to study the genes involved in arachidonic acid metabolism in order to detect gene patterns that possibly could be connected to the different gene expression profiles of multiple myeloma. NMF results have been verified by western blotting analysis in six HMCLs of proteins expressed by some of the most abundantly expressed genes. The experiments showed the effectiveness of NMF in intelligently analyzing microarray data.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A4-A4
Author(s):  
Anushka Dikshit ◽  
Dan Zollinger ◽  
Karen Nguyen ◽  
Jill McKay-Fleisch ◽  
Kit Fuhrman ◽  
...  

BackgroundThe canonical WNT-β-catenin signaling pathway is vital for development and tissue homeostasis but becomes strongly tumorigenic when dysregulated. and alter the transcriptional signature of a cell to promote malignant transformation. However, thorough characterization of these transcriptomic signatures has been challenging because traditional methods lack either spatial information, multiplexing, or sensitivity/specificity. To overcome these challenges, we developed a novel workflow combining the single molecule and single cell visualization capabilities of the RNAscope in situ hybridization (ISH) assay with the highly multiplexed spatial profiling capabilities of the GeoMx™ Digital Spatial Profiler (DSP) RNA assays. Using these methods, we sought to spatially profile and compare gene expression signatures of tumor niches with high and low CTNNB1 expression.MethodsAfter screening 120 tumor cores from multiple tumors for CTNNB1 expression by the RNAscope assay, we identified melanoma as the tumor type with the highest CTNNB1 expression while prostate tumors had the lowest expression. Using the RNAscope Multiplex Fluorescence assay we selected regions of high CTNNB1 expression within 3 melanoma tumors as well as regions with low CTNNB1 expression within 3 prostate tumors. These selected regions of interest (ROIs) were then transcriptionally profiled using the GeoMx DSP RNA assay for a set of 78 genes relevant in immuno-oncology. Target genes that were differentially expressed were further visualized and spatially assessed using the RNAscope Multiplex Fluorescence assay to confirm GeoMx DSP data with single cell resolution.ResultsThe GeoMx DSP analysis comparing the melanoma and prostate tumors revealed that they had significantly different gene expression profiles and many of these genes showed concordance with CTNNB1 expression. Furthermore, immunoregulatory targets such as ICOSLG, CTLA4, PDCD1 and ARG1, also demonstrated significant correlation with CTNNB1 expression. On validating selected targets using the RNAscope assay, we could distinctly visualize that they were not only highly expressed in melanoma compared to the prostate tumor, but their expression levels changed proportionally to that of CTNNB1 within the same tumors suggesting that these differentially expressed genes may be regulated by the WNT-β-catenin pathway.ConclusionsIn summary, by combining the RNAscope ISH assay and the GeoMx DSP RNA assay into one joint workflow we transcriptionally profiled regions of high and low CTNNB1 expression within melanoma and prostate tumors and identified genes potentially regulated by the WNT- β-catenin pathway. This novel workflow can be fully automated and is well suited for interrogating the tumor and stroma and their interactions.GeoMx Assays are for RESEARCH ONLY, not for diagnostics.


2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Li Teng ◽  
Laiwan Chan

SummaryTraditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5285 ◽  
Author(s):  
Mei Sze Tan ◽  
Siow-Wee Chang ◽  
Phaik Leng Cheah ◽  
Hwa Jen Yap

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).


Sign in / Sign up

Export Citation Format

Share Document