scholarly journals A Sparse Additive Model for High-Dimensional Interactions with an Exposure Variable

2018 ◽  
Author(s):  
Sahir R Bhatnagar ◽  
Tianyuan Lu ◽  
Amanda Lovato ◽  
David L Olds ◽  
Michael S Kobor ◽  
...  

AbstractA conceptual paradigm for onset of a new disease is often considered to be the result of changes in entire biological networks whose states are affected by a complex interaction of genetic and environmental factors. However, when modelling a relevant phenotype as a function of high dimensional measurements, power to estimate inter-actions is low, the number of possible interactions could be enormous and their effects may be non-linear. Existing approaches for high dimensional modelling such as the lasso might keep an interaction but remove a main effect, which is problematic for interpretation. In this work, we introduce a method called sail for detecting non-linear interactions with a key environmental or exposure variable in high-dimensional settings which respects either the strong or weak heredity constraints. We prove that asymptotically, our method possesses the oracle property, i.e., it performs as well as if the true model were known in advance. We develop a computationally effcient fitting algorithm with automatic tuning parameter selection, which scales to high-dimensional datasets. Through an extensive simulation study, we show that sail out-performs existing penalized regression methods in terms of prediction accuracy and support recovery when there are non-linear interactions with an exposure variable. We then apply sail to detect non-linear interactions between genes and a prenatal psychosocial intervention program on cognitive performance in children at 4 years of age. Results from our method show that individuals who are genetically predisposed to lower educational attainment are those who stand to benefit the most from the intervention. Our algorithms are implemented in an R package available on CRAN (https://cran.r-project.org/package=sail).

2020 ◽  
Vol 7 (1) ◽  
pp. 209-226 ◽  
Author(s):  
Yunan Wu ◽  
Lan Wang

Penalized (or regularized) regression, as represented by lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regression relies crucially on the choice of the tuning parameter, which determines the amount of regularization and hence the sparsity level of the fitted model. The optimal choice of tuning parameter depends on both the structure of the design matrix and the unknown random error distribution (variance, tail behavior, etc.). This article reviews the current literature of tuning parameter selection for high-dimensional regression from both the theoretical and practical perspectives. We discuss various strategies that choose the tuning parameter to achieve prediction accuracy or support recovery. We also review several recently proposed methods for tuning-free high-dimensional regression.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Anne-Laure Boulesteix ◽  
Riccardo De Bin ◽  
Xiaoyu Jiang ◽  
Mathias Fuchs

As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.


Author(s):  
Pascal Lefebvre ◽  
Raju Valivarthi ◽  
Qiang Zhou ◽  
Lee Oesterling ◽  
Daniel Oblak ◽  
...  

Author(s):  
Jan-Tobias Sohns ◽  
Michaela Schmitt ◽  
Fabian Jirasek ◽  
Hans Hasse ◽  
Heike Leitte

Sign in / Sign up

Export Citation Format

Share Document