Matrix Factorization-based Improved Classification of Gene Expression Data
Background: The medical data, in the form of prescriptions and test reports, is very extensive which needs a comprehensive analysis. Objective: The gene expression data set is formulated using a very large number of genes associated to thousands of samples. Identifying the relevant biological information from these complex associations is a difficult task. Methods: For this purpose, a variety of classification algorithms are available which can be used to automatically detect the desired information. K-Nearest Neighbour Algorithm, Latent Dirichlet Allocation, Gaussian Naïve Bayes and support Vector Classifier are some of the well known algorithms used for the classification task. Nonnegative Matrix Factorization is a technique which has gained a lot of popularity because of its nonnegativity constraints. This technique can be used for better interpretability of data. Results: In this paper, we applied NMF as a pre-processing step for better results. We also evaluated the given classifiers on the basis of four criteria: accuracy, precision, specificity and Recall. Conclusion: The experimental results shows that these classifiers give better performance when NMF is applied at pre-processing of data before giving it to the said classifiers. Gaussian Naïve Bias algorithm showed a significant improvement in classification after the application of NMF at preprocessing.