scholarly journals A variable selection approach for highly correlated predictors in high-dimensional genomic data

Author(s):  
Wencan Zhu ◽  
Céline Lévy-Leduc ◽  
Nils Ternès

Abstract Motivation In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. Results We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer. Availabilityand implementation Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN). Supplementary information Supplementary data are available at Bioinformatics online.

2012 ◽  
Vol 55 (2) ◽  
pp. 327-347 ◽  
Author(s):  
Dengke Xu ◽  
Zhongzhan Zhang ◽  
Liucang Wu

2017 ◽  
Vol 33 (22) ◽  
pp. 3595-3602 ◽  
Author(s):  
Yao-Hwei Fang ◽  
Jie-Huei Wang ◽  
Chao A Hsiung

2018 ◽  
Vol 67 (4) ◽  
pp. 813-839 ◽  
Author(s):  
Anna Bonnet ◽  
Céline Lévy‐Leduc ◽  
Elisabeth Gassiat ◽  
Roberto Toro ◽  
Thomas Bourgeron

Author(s):  
Frédéric Bertrand ◽  
Ismaïl Aouadi ◽  
Nicolas Jung ◽  
Raphael Carapito ◽  
Laurent Vallat ◽  
...  

Abstract Motivation With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting. Results In this article, we propose a general algorithm, which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or be used in an experimental design planning perspective. We demonstrate the performance of our algorithm on both simulated and real data. We then apply it in two different ways to improve biological network reverse-engineering. Availability and implementation Code is available as the SelectBoost package on the CRAN, https://cran.r-project.org/package=SelectBoost. Some network reverse-engineering functionalities are available in the Patterns CRAN package, https://cran.r-project.org/package=Patterns. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Kevin He ◽  
Xiang Zhou ◽  
Hui Jiang ◽  
Xiaoquan Wen ◽  
Yi Li

Abstract Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors much exceeding the sample size. Penalized variable selection has emerged as a powerful and efficient dimension reduction tool. However, control of false discoveries (i.e. inclusion of irrelevant variables) for penalized high-dimensional variable selection presents serious challenges. To effectively control the fraction of false discoveries for penalized variable selections, we propose a false discovery controlling procedure. The proposed method is general and flexible, and can work with a broad class of variable selection algorithms, not only for linear regressions, but also for generalized linear models and survival analysis.


2019 ◽  
Vol 36 (6) ◽  
pp. 1785-1794
Author(s):  
Jun Li ◽  
Qing Lu ◽  
Yalu Wen

Abstract Motivation The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. Results We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer’s Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. Availability and implementation The R-package is available at https://github.com/YaluWen/OmicPred. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (14) ◽  
pp. 4189-4190
Author(s):  
Yang Liu ◽  
Vinod Kumar Singh ◽  
Deyou Zheng

Abstract Summary Visualization in 3D space is a standard but critical process for examining the complex structure of high-dimensional data. Stereoscopic imaging technology can be adopted to enhance 3D representation of many complex data, especially those consisting of points and lines. We illustrate the simple steps that are involved and strongly recommend others to implement it in designing visualization software. To facilitate its application, we created a new software that can convert a regular 3D scatterplot or network figure to a pair of stereo images. Availability and implementation Stereo3D is freely available as an open source R package released under an MIT license at https://github.com/bioinfoDZ/Stereo3D. Others can integrate the codes and implement the method in academic software. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document