A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

2018 ◽  
Vol 13 (3) ◽  
pp. 753-771 ◽  
Author(s):  
Zakariya Yahya Algamal ◽  
Muhammad Hisyam Lee
Author(s):  
Zhenqiu Liu ◽  
Feng Jiang ◽  
Guoliang Tian ◽  
Suna Wang ◽  
Fumiaki Sato ◽  
...  

In this paper, we propose a novel method for sparse logistic regression with non-convex regularization Lp (p <1). Based on smooth approximation, we develop several fast algorithms for learning the classifier that is applicable to high dimensional dataset such as gene expression. To the best of our knowledge, these are the first algorithms to perform sparse logistic regression with an Lp and elastic net (Le) penalty. The regularization parameters are decided through maximizing the area under the ROC curve (AUC) of the test data. Experimental results on methylation and microarray data attest the accuracy, sparsity, and efficiency of the proposed algorithms. Biomarkers identified with our methods are compared with that in the literature. Our computational results show that Lp Logistic regression (p <1) outperforms the L1 logistic regression and SCAD SVM. Software is available upon request from the first author.


Now a day’s cancer has become a deathly disease due to the abnormal growth of the cell. Many researchers are working in this area for the early prediction of cancer. For the proper classification of cancer data, demands for the identification of proper set of genes by analyzing the genomic data. Most of the researchers used microarrays to identify the cancerous genomes. However, such kind of data is high dimensional where number of genes are more compared to samples. Also the data consists of many irrelevant features and noisy data. The classification technique deal with such kind of data influences the performance of algorithm. A popular classification algorithm (i.e., Logistic Regression) is considered in this work for gene classification. Regularization techniques like Lasso with L1 penalty, Ridge with L2 penalty, and hybrid Lasso with L1/2+2 penalty used to minimize irrelevant features and avoid overfitting. However, these methods are of sparse parametric and limits to linear data. Also methods have not produced promising performance when applied to high dimensional genome data. For solving these problems, this paper presents an Additive Sparse Logistic Regression with Additive Regularization (ASLR) method to discriminate linear and non-linear variables in gene classification. The results depicted that the proposed method proved to be the best-regularized method for classifying microarray data compared to standard methods


2021 ◽  
Vol 29 ◽  
pp. 287-295
Author(s):  
Zhiming Zhou ◽  
Haihui Huang ◽  
Yong Liang

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.


2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


2011 ◽  
Vol 7 (3) ◽  
pp. 142-146 ◽  
Author(s):  
Kohbalan Moorthy ◽  
Mohd Saberi Mohamad

Gene ◽  
2019 ◽  
Vol 706 ◽  
pp. 188-200 ◽  
Author(s):  
Xiao Zheng ◽  
Wenyang Zhu ◽  
Chang Tang ◽  
Minhui Wang

Sign in / Sign up

Export Citation Format

Share Document