A simple and efficient algorithm for gene selection using sparse logistic regression

S. K. Shevade; S. S. Keerthi

doi:10.1093/bioinformatics/btg308

Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization

Applied Sciences ◽

10.3390/app8091569 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1569 ◽

Cited By ~ 3

Author(s):

Shengbing Wu ◽

Hongkun Jiang ◽

Haiwei Shen ◽

Ziyi Yang

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Classification Performance ◽

Cancer Classification ◽

Sparse Logistic Regression ◽

The Subject ◽

Selection For ◽

Microarray Datasets ◽

Sparse Methods

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.

Download Full-text

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

Bioinformatics ◽

10.1093/bioinformatics/btl386 ◽

2006 ◽

Vol 22 (19) ◽

pp. 2348-2355 ◽

Cited By ~ 139

Author(s):

G. C. Cawley ◽

N. L. C. Talbot

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Cancer Classification ◽

Bayesian Regularization ◽

Sparse Logistic Regression

Download Full-text

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

Advances in Data Analysis and Classification ◽

10.1007/s11634-018-0334-1 ◽

2018 ◽

Vol 13 (3) ◽

pp. 753-771 ◽

Cited By ~ 10

Author(s):

Zakariya Yahya Algamal ◽

Muhammad Hisyam Lee

Keyword(s):

Logistic Regression ◽

Microarray Data ◽

Gene Selection ◽

Data Classification ◽

High Dimensional ◽

Two Stage ◽

Sparse Logistic Regression

Download Full-text

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification

BMC Bioinformatics ◽

10.1186/1471-2105-14-198 ◽

2013 ◽

Vol 14 (1) ◽

Cited By ~ 77

Author(s):

Yong Liang ◽

Cheng Liu ◽

Xin-Ze Luan ◽

Kwong-Sak Leung ◽

Tak-Ming Chan ◽

...

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Cancer Classification ◽

Sparse Logistic Regression

Download Full-text

Early diagnosis model of Alzheimer’s disease based on sparse logistic regression with the generalized elastic net

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2020.102362 ◽

2021 ◽

Vol 66 ◽

pp. 102362

Author(s):

Ruyi Xiao ◽

Xinchun Cui ◽

Hong Qiao ◽

Xiangwei Zheng ◽

Yiquan Zhang ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Logistic Regression ◽

Early Diagnosis ◽

Elastic Net ◽

Sparse Logistic Regression ◽

Model Of Alzheimer’S Disease ◽

Diagnosis Model

Download Full-text

GENE SELECTION USING LOGISTIC REGRESSIONS BASED ON AIC, BIC AND MDL CRITERIA

New Mathematics and Natural Computation ◽

10.1142/s179300570500007x ◽

2005 ◽

Vol 01 (01) ◽

pp. 129-145 ◽

Cited By ~ 15

Author(s):

XIAOBO ZHOU ◽

XIAODONG WANG ◽

EDWARD R. DOUGHERTY

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Information Criterion ◽

Cancer Classification ◽

Data Sets ◽

Classification Methods ◽

Gene Expressions ◽

Experimental Conditions

In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables (gene expressions) and the small number of experimental conditions. Many gene-selection and classification methods have been proposed; however most of these treat gene selection and classification separately, and not under the same model. We propose a Bayesian approach to gene selection using the logistic regression model. The Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the minimum description length (MDL) principle are used in constructing the posterior distribution of the chosen genes. The same logistic regression model is then used for cancer classification. Fast implementation issues for these methods are discussed. The proposed methods are tested on several data sets including those arising from hereditary breast cancer, small round blue-cell tumors, lymphoma, and acute leukemia. The experimental results indicate that the proposed methods show high classification accuracies on these data sets. Some robustness and sensitivity properties of the proposed methods are also discussed. Finally, mixing logistic-regression based gene selection with other classification methods and mixing logistic-regression-based classification with other gene-selection methods are considered.

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

High-Dimensional Classification by Sparse Logistic Regression

IEEE Transactions on Information Theory ◽

10.1109/tit.2018.2884963 ◽

2019 ◽

Vol 65 (5) ◽

pp. 3068-3079 ◽

Cited By ~ 5

Author(s):

Felix Abramovich ◽

Vadim Grinshtein

Keyword(s):

Logistic Regression ◽

High Dimensional ◽

Sparse Logistic Regression ◽

Dimensional Classification

Download Full-text

Nonconvex Sparse Logistic Regression via Proximal Gradient Descent

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462617 ◽

2018 ◽

Author(s):

Xinyue Shen ◽

Yuantao Gu

Keyword(s):

Logistic Regression ◽

Gradient Descent ◽

Sparse Logistic Regression ◽

Proximal Gradient Descent

Download Full-text

Credit Card Fraud Detection Using Autoencoder Model in Unbalanced Datasets

Journal of Advances in Mathematics and Computer Science ◽

10.9734/jamcs/2019/v33i530192 ◽

2019 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

M. A. Al-Shabi

Keyword(s):

Logistic Regression ◽

Deep Learning ◽

Efficient Algorithm ◽

Credit Card ◽

Learning Algorithm ◽

Credit Cards ◽

Reconstruction Error ◽

Insurance Companies ◽

Deep Learning Algorithm ◽

Unbalanced Dataset

Fraudulent credit card transaction is still one of problems that face the companies and banks sectors; it causes them to lose billions of dollars every year. The design of efficient algorithm is one of the most important challenges in this area. This paper aims to propose an efficient approach that automatic detects fraud credit card related to insurance companies using deep learning algorithm called Autoencoders. The effectiveness of the proposed method has been proved in identifying fraud in actual data from transactions made by credit cards in September 2013 by European cardholders. In addition, a solution for data unbalancing is provided in this paper, which affects most current algorithms. The suggested solution relies on training for the autoencoder for the reconstruction normal data. Anomalies are detected by defining a reconstruction error threshold and considering the cases with a superior threshold as anomalies. The algorithm's performance was able to detected fraudulent transactions between 64% at the threshold = 5, 79% at the threshold = 3 and 91% at threshold= 0.7, it is better in performance compare with logistic regression 57% in unbalanced dataset.

Download Full-text