On -divergence based nonnegative matrix factorization for clustering cancer gene expression data

2008 ◽  
Vol 44 (1) ◽  
pp. 1-5 ◽  
Author(s):  
Weixiang Liu ◽  
Kehong Yuan ◽  
Datian Ye
2014 ◽  
Vol 678 ◽  
pp. 12-18
Author(s):  
Duo Wang

The clustering analysis of the cancer gene expression data can provide bases for the early diagnosis of cancer and accurate classification of the cancer subtypes. For the characteristics of cancer gene expression data, an algorithm named FNMF-ITWC (Fast Nonnegative Matrix Factorization Interrelated Two_Way Clustering) is proposed. FNMF-ITWC algorithm firstly selects genes from the original gene expression data, implements non-negative matrix factorization on the row (gene dimension), and then performs clustering on the column (sample dimension). Matlab experimental results show that FNMF-ITWC algorithm improves the computing speed of the algorithm and reduces the data storage space. At the same time, it is able to reveal correlation among genes under certain experimental conditions and the correlation among experiments for some genes.


2020 ◽  
Vol 13 (5) ◽  
pp. 858-863
Author(s):  
Shaily Malik ◽  
Poonam Bansal

Background: The medical data, in the form of prescriptions and test reports, is very extensive which needs a comprehensive analysis. Objective: The gene expression data set is formulated using a very large number of genes associated to thousands of samples. Identifying the relevant biological information from these complex associations is a difficult task. Methods: For this purpose, a variety of classification algorithms are available which can be used to automatically detect the desired information. K-Nearest Neighbour Algorithm, Latent Dirichlet Allocation, Gaussian Naïve Bayes and support Vector Classifier are some of the well known algorithms used for the classification task. Nonnegative Matrix Factorization is a technique which has gained a lot of popularity because of its nonnegativity constraints. This technique can be used for better interpretability of data. Results: In this paper, we applied NMF as a pre-processing step for better results. We also evaluated the given classifiers on the basis of four criteria: accuracy, precision, specificity and Recall. Conclusion: The experimental results shows that these classifiers give better performance when NMF is applied at pre-processing of data before giving it to the said classifiers. Gaussian Naïve Bias algorithm showed a significant improvement in classification after the application of NMF at preprocessing.


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13777 ◽  
Author(s):  
Zheng Chang ◽  
Zhenjia Wang ◽  
Cody Ashby ◽  
Chuan Zhou ◽  
Guojun Li ◽  
...  

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checkerboard patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.


2016 ◽  
Vol 113 (16) ◽  
pp. 4290-4295 ◽  
Author(s):  
Siqi Wu ◽  
Antony Joseph ◽  
Ann S. Hammonds ◽  
Susan E. Celniker ◽  
Bin Yu ◽  
...  

Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior–posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.


2020 ◽  
Vol 526 ◽  
pp. 245-262
Author(s):  
Yunhe Wang ◽  
Zhiqiang Ma ◽  
Ka-Chun Wong ◽  
Xiangtao Li

Sign in / Sign up

Export Citation Format

Share Document