Matrix Factorization-based Improved Classification of Gene Expression Data

Background: The medical data, in the form of prescriptions and test reports, is very extensive which needs a comprehensive analysis. Objective: The gene expression data set is formulated using a very large number of genes associated to thousands of samples. Identifying the relevant biological information from these complex associations is a difficult task. Methods: For this purpose, a variety of classification algorithms are available which can be used to automatically detect the desired information. K-Nearest Neighbour Algorithm, Latent Dirichlet Allocation, Gaussian Naïve Bayes and support Vector Classifier are some of the well known algorithms used for the classification task. Nonnegative Matrix Factorization is a technique which has gained a lot of popularity because of its nonnegativity constraints. This technique can be used for better interpretability of data. Results: In this paper, we applied NMF as a pre-processing step for better results. We also evaluated the given classifiers on the basis of four criteria: accuracy, precision, specificity and Recall. Conclusion: The experimental results shows that these classifiers give better performance when NMF is applied at pre-processing of data before giving it to the said classifiers. Gaussian Naïve Bias algorithm showed a significant improvement in classification after the application of NMF at preprocessing.

Download Full-text

The estimation of dimensionality in gene expression data using Nonnegative Matrix Factorization

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359922 ◽

2015 ◽

Cited By ~ 1

Author(s):

Conor J. Kelton ◽

Waishing Lee ◽

Matthew Rusay ◽

Ondrej Maxian ◽

Elana J. Fertig ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Expression Data

Download Full-text

Regularized nonnegative matrix factorization for clustering gene expression data

2013 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2013.6732613 ◽

2013 ◽

Cited By ~ 2

Author(s):

Weixiang Liu ◽

Tianfu Wang ◽

Siping Chen

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Expression Data

Download Full-text

FNMF-ITWC Algorithm Applied to the Cancer Gene Expression Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.678.12 ◽

2014 ◽

Vol 678 ◽

pp. 12-18

Author(s):

Duo Wang

Keyword(s):

Gene Expression ◽

Data Storage ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix ◽

Cancer Gene ◽

Expression Data ◽

Experimental Conditions ◽

Cancer Subtypes ◽

Early Diagnosis Of Cancer

The clustering analysis of the cancer gene expression data can provide bases for the early diagnosis of cancer and accurate classification of the cancer subtypes. For the characteristics of cancer gene expression data, an algorithm named FNMF-ITWC (Fast Nonnegative Matrix Factorization Interrelated Two_Way Clustering) is proposed. FNMF-ITWC algorithm firstly selects genes from the original gene expression data, implements non-negative matrix factorization on the row (gene dimension), and then performs clustering on the column (sample dimension). Matlab experimental results show that FNMF-ITWC algorithm improves the computing speed of the algorithm and reduces the data storage space. At the same time, it is able to reveal correlation among genes under certain experimental conditions and the correlation among experiments for some genes.

Download Full-text

On -divergence based nonnegative matrix factorization for clustering cancer gene expression data

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2008.05.001 ◽

2008 ◽

Vol 44 (1) ◽

pp. 1-5 ◽

Cited By ~ 9

Author(s):

Weixiang Liu ◽

Kehong Yuan ◽

Datian Ye

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Cancer Gene ◽

Expression Data

Download Full-text

An Empirical Comparison Study on Kernel Based Support Vector Machine for Classification of Gene Expression Data Set

Procedia Engineering ◽

10.1016/j.proeng.2012.06.165 ◽

2012 ◽

Vol 38 ◽

pp. 1340-1345 ◽

Cited By ~ 3

Author(s):

Smruti Rekha Das ◽

Kaberi Das ◽

Debahuti Mishra ◽

Kailash Shaw ◽

Sashikala Mishra

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Gene Expression Data ◽

Support Vector ◽

Expression Data ◽

Comparison Study ◽

Empirical Comparison ◽

Data Set

Download Full-text

Sparse Nonnegative Matrix Factorization for Classification of Gene Expression Data

2007 1st International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2007.49 ◽

2007 ◽

Cited By ~ 2

Author(s):

Weixiang Liu ◽

Kehong Yuan ◽

Zhenhua Xie

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Expression Data ◽

Sparse Nonnegative Matrix Factorization

Download Full-text

A NOTE ON CLASSIFICATION OF GENE EXPRESSION DATA USING SUPPORT VECTOR MACHINES

Journal of Biological System ◽

10.1142/s0218339003000658 ◽

2003 ◽

Vol 11 (01) ◽

pp. 43-56 ◽

Cited By ~ 9

Author(s):

KRZYSZTOF FUJAREWICZ ◽

MAREK KIMMEL ◽

JOANNA RZESZOWSKA-WOLNY ◽

ANDRZEJ SWIERNIAK

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Gene Expression Data ◽

Lymphoblastic Leukemia ◽

Heuristic Method ◽

Support Vector ◽

Expression Data ◽

Validation Data ◽

Data Set ◽

Vector Machines

Microarrays provide a new technique of measuring gene expression that attracted a lot of research interest in recent years. It has been suggested that gene expression data from microarrays (biochips) can be utilized in many biomedical areas, for example in cancer classification. Whereas several, new and existing, methods of classification has been tested, a selection of proper (optimal) set of genes, which expression serves during classification, is still an open problem. In this paper we propose a heuristic method of choosing suboptimal set of genes by using support vector machines (SVM). Obtained set of genes optimizes leave-one-out cross-validation error. The method is tested on microarray gene expression data of samples of two cancer types: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). The results show that quality of classification is much better than for sets obtained using other methods of feature selection. In addition, we demonstrate that maximum separation in a training data set may lead to deterioration of performance in an independent validation data set, a phenomenon akin to overfitting.

Download Full-text

Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2008.020524 ◽

2008 ◽

Vol 2 (3) ◽

pp. 236 ◽

Cited By ~ 5

Author(s):

Weixiang Liu ◽

Kehong Yuan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Expression Data

Download Full-text

eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

Cancer Informatics ◽

10.4137/cin.s13777 ◽

2014 ◽

Vol 13s2 ◽

pp. CIN.S13777 ◽

Cited By ~ 2

Author(s):

Zheng Chang ◽

Zhenjia Wang ◽

Cody Ashby ◽

Chuan Zhou ◽

Guojun Li ◽

...

Keyword(s):

Gene Expression ◽

Cancer Patients ◽

Gene Expression Data ◽

Matrix Factorization ◽

Expression Profiles ◽

Nonnegative Matrix ◽

Gene Expression Profiles ◽

Expression Data ◽

Cancer Subtypes ◽

First Choice

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checkerboard patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.

Download Full-text

Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1521171113 ◽

2016 ◽

Vol 113 (16) ◽

pp. 4290-4295 ◽

Cited By ~ 53

Author(s):

Siqi Wu ◽

Antony Joseph ◽

Ann S. Hammonds ◽

Susan E. Celniker ◽

Bin Yu ◽

...

Keyword(s):

Gene Expression ◽

Matrix Factorization ◽

Gene Networks ◽

Regulatory Networks ◽

Nonnegative Matrix Factorization ◽

Expression Patterns ◽

Selection Criterion ◽

Nonnegative Matrix ◽

Biological Information ◽

Expression Data

Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior–posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.

Download Full-text