Simultaneous dimension reduction and variable selection in modeling high dimensional data

2017 ◽  
Vol 112 ◽  
pp. 242-256 ◽  
Author(s):  
Joseph Ryan G. Lansangan ◽  
Erniel B. Barrios
2017 ◽  
Author(s):  
Sahir Rai Bhatnagar ◽  
Yi Yang ◽  
Budhachandra Khundrakpam ◽  
Alan C Evans ◽  
Mathieu Blanchette ◽  
...  

AbstractPredicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in analysis of high-dimensional data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly-used two-step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype.It is known that important exposure variables can alter correlation patterns between clusters of high-dimensional variables, i.e., alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network-altering effects, we explore whether use of exposure-dependent clustering relationships in dimension reduction can improve predictive modelling in a two-step framework. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations.With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modelling framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.


2013 ◽  
Vol 303-306 ◽  
pp. 1101-1104 ◽  
Author(s):  
Yong De Hu ◽  
Jing Chang Pan ◽  
Xin Tan

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.


2018 ◽  
Vol 11 (2) ◽  
pp. 385-395 ◽  
Author(s):  
Aijun Yang ◽  
Heng Lian ◽  
Xuejun Jiang ◽  
Pengfei Liu

Sign in / Sign up

Export Citation Format

Share Document