scholarly journals AdaptiveL1/2Shooting Regularization Method for Survival Analysis Using Gene Expression Data

2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Xiao-Ying Liu ◽  
Yong Liang ◽  
Zong-Ben Xu ◽  
Hai Zhang ◽  
Kwong-Sak Leung

A new adaptiveL1/2shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptiveL1/2shooting algorithm can be easily obtained by the optimization of a reweighed iterative series ofL1penalties and a shooting strategy ofL1/2penalty. Simulation results based on high dimensional artificial data show that the adaptiveL1/2shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that theL1/2regularization method performs competitively.

2013 ◽  
Vol 12 ◽  
pp. CIN.S10212 ◽  
Author(s):  
Lingkang Huang ◽  
Hao Helen Zhang ◽  
Zhao-Bang Zeng ◽  
Pierre R. Bushel

Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability The source MATLAB code are available from http://math.arizona.edu/∼hzhang/software.html.


Mathematics ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 457 ◽  
Author(s):  
Md Sarker ◽  
Michael Pokojovy ◽  
Sangjin Kim

In high-dimensional gene expression data analysis, the accuracy and reliability of cancer classification and selection of important genes play a very crucial role. To identify these important genes and predict future outcomes (tumor vs. non-tumor), various methods have been proposed in the literature. But only few of them take into account correlation patterns and grouping effects among the genes. In this article, we propose a rank-based modification of the popular penalized logistic regression procedure based on a combination of ℓ 1 and ℓ 2 penalties capable of handling possible correlation among genes in different groups. While the ℓ 1 penalty maintains sparsity, the ℓ 2 penalty induces smoothness based on the information from the Laplacian matrix, which represents the correlation pattern among genes. We combined logistic regression with the BH-FDR (Benjamini and Hochberg false discovery rate) screening procedure and a newly developed rank-based selection method to come up with an optimal model retaining the important genes. Through simulation studies and real-world application to high-dimensional colon cancer gene expression data, we demonstrated that the proposed rank-based method outperforms such currently popular methods as lasso, adaptive lasso and elastic net when applied both to gene selection and classification.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Hamid Alavi Majd ◽  
Soodeh Shahsavari ◽  
Ahmad Reza Baghestani ◽  
Seyyed Mohammad Tabatabaei ◽  
Naghme Khadem Bashi ◽  
...  

Background.Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now.Objective.The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets.Methods.Two simulated matrices with different degrees of overlap and noise are generated and then the intrinsic structure of these data is compared with biclusters result. Also, we have searched biologically significant discovered biclusters by GO analysis.Results.When there is no noise the algorithm almost discovered all of the biclusters but when there is moderate noise in the dataset, this algorithm cannot perform very well in finding overlapping biclusters and if noise is big, the result of biclustering is not reliable.Conclusion.The plaid model needs to be modified because when there is a moderate or big noise in the data, it cannot find good biclusters. This is a statistical model and is a quite flexible one. In summary, in order to reduce the errors, model can be manipulated and distribution of error can be changed.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Ge-Jin Chu ◽  
Yong Liang ◽  
Jia-Xuan Wang

Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvexLq  (1/2<q<1)regularizations, to select key risk factors in the Cox’s proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.


Sign in / Sign up

Export Citation Format

Share Document