A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

Zakariya Yahya Algamal; Muhammad Hisyam Lee

doi:10.1007/s11634-018-0334-1

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

Advances in Data Analysis and Classification ◽

10.1007/s11634-018-0334-1 ◽

2018 ◽

Vol 13 (3) ◽

pp. 753-771 ◽

Cited By ~ 10

Author(s):

Zakariya Yahya Algamal ◽

Muhammad Hisyam Lee

Keyword(s):

Logistic Regression ◽

Microarray Data ◽

Gene Selection ◽

Data Classification ◽

High Dimensional ◽

Two Stage ◽

Sparse Logistic Regression

Download Full-text

Sparse Logistic Regression with Lp Penalty for Biomarker Identification

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1248 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 30

Author(s):

Zhenqiu Liu ◽

Feng Jiang ◽

Guoliang Tian ◽

Suna Wang ◽

Fumiaki Sato ◽

...

Keyword(s):

Logistic Regression ◽

Microarray Data ◽

High Dimensional ◽

Computational Results ◽

Smooth Approximation ◽

Biomarker Identification ◽

Sparse Logistic Regression ◽

Regularization Parameters ◽

Novel Method ◽

Convex Regularization

In this paper, we propose a novel method for sparse logistic regression with non-convex regularization Lp (p <1). Based on smooth approximation, we develop several fast algorithms for learning the classifier that is applicable to high dimensional dataset such as gene expression. To the best of our knowledge, these are the first algorithms to perform sparse logistic regression with an Lp and elastic net (Le) penalty. The regularization parameters are decided through maximizing the area under the ROC curve (AUC) of the test data. Experimental results on methylation and microarray data attest the accuracy, sparsity, and efficiency of the proposed algorithms. Biomarkers identified with our methods are compared with that in the literature. Our computational results show that Lp Logistic regression (p <1) outperforms the L1 logistic regression and SCAD SVM. Software is available upon request from the first author.

Download Full-text

An Additive Sparse Logistic Regularization Method for Cancer Classification in Microarray Data

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/10 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Logistic Regression ◽

Microarray Data ◽

High Dimensional ◽

Cancer Data ◽

Abnormal Growth ◽

Sparse Logistic Regression ◽

Genome Data ◽

Regularization Techniques ◽

Performance Of Algorithm

Now a day’s cancer has become a deathly disease due to the abnormal growth of the cell. Many researchers are working in this area for the early prediction of cancer. For the proper classification of cancer data, demands for the identification of proper set of genes by analyzing the genomic data. Most of the researchers used microarrays to identify the cancerous genomes. However, such kind of data is high dimensional where number of genes are more compared to samples. Also the data consists of many irrelevant features and noisy data. The classification technique deal with such kind of data influences the performance of algorithm. A popular classification algorithm (i.e., Logistic Regression) is considered in this work for gene classification. Regularization techniques like Lasso with L1 penalty, Ridge with L2 penalty, and hybrid Lasso with L1/2+2 penalty used to minimize irrelevant features and avoid overfitting. However, these methods are of sparse parametric and limits to linear data. Also methods have not produced promising performance when applied to high dimensional genome data. For solving these problems, this paper presents an Additive Sparse Logistic Regression with Additive Regularization (ASLR) method to discriminate linear and non-linear variables in gene classification. The results depicted that the proposed method proved to be the best-regularized method for classifying microarray data compared to standard methods

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

High-Dimensional Classification by Sparse Logistic Regression

IEEE Transactions on Information Theory ◽

10.1109/tit.2018.2884963 ◽

2019 ◽

Vol 65 (5) ◽

pp. 3068-3079 ◽

Cited By ~ 5

Author(s):

Felix Abramovich ◽

Vadim Grinshtein

Keyword(s):

Logistic Regression ◽

High Dimensional ◽

Sparse Logistic Regression ◽

Dimensional Classification

Download Full-text

Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization

Applied Sciences ◽

10.3390/app8091569 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1569 ◽

Cited By ~ 3

Author(s):

Shengbing Wu ◽

Hongkun Jiang ◽

Haiwei Shen ◽

Ziyi Yang

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Classification Performance ◽

Cancer Classification ◽

Sparse Logistic Regression ◽

The Subject ◽

Selection For ◽

Microarray Datasets ◽

Sparse Methods

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.

Download Full-text

Random forest for gene selection and microarray data classification

Bioinformation ◽

10.6026/97320630007142 ◽

2011 ◽

Vol 7 (3) ◽

pp. 142-146 ◽

Cited By ~ 27

Author(s):

Kohbalan Moorthy ◽

Mohd Saberi Mohamad

Keyword(s):

Random Forest ◽

Microarray Data ◽

Gene Selection ◽

Data Classification

Download Full-text

Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression

Journal of Computational Biology ◽

10.1089/1066527041410445 ◽

2004 ◽

Vol 11 (2-3) ◽

pp. 215-226 ◽

Cited By ~ 16

Author(s):

Wentian Li ◽

Fengzhu Sun ◽

Ivo Grosse

Keyword(s):

Logistic Regression ◽

Data Analysis ◽

Microarray Data ◽

Selection Criteria ◽

Gene Selection ◽

Extreme Value Distribution ◽

Extreme Value ◽

Value Distribution ◽

Microarray Data Analysis

Download Full-text

Gene Selection for Microarray Data Classification Using Hybrid Meta-Heuristics

Modelling and Implementation of Complex Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-05481-6_9 ◽

2018 ◽

pp. 119-132 ◽

Cited By ~ 1

Author(s):

Nassima Dif ◽

Mohamed walid Attaoui ◽

Zakaria Elberrichi

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Data Classification ◽

Selection For

Download Full-text

Gene selection for microarray data classification via adaptive hypergraph embedded dictionary learning

Gene ◽

10.1016/j.gene.2019.04.060 ◽

2019 ◽

Vol 706 ◽

pp. 188-200 ◽

Cited By ~ 2

Author(s):

Xiao Zheng ◽

Wenyang Zhu ◽

Chang Tang ◽

Minhui Wang

Keyword(s):

Microarray Data ◽

Dictionary Learning ◽

Gene Selection ◽

Data Classification ◽

Selection For

Download Full-text

A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2008.46 ◽

2010 ◽

Vol 7 (1) ◽

pp. 108-117 ◽

Cited By ~ 92

Author(s):

Yukyee Leung ◽

Yeungsam Hung

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Data Classification ◽

Wrapper Approach

Download Full-text