Sparse sufficient dimension reduction with heteroscedasticity

Author(s):  
Haoyang Cheng ◽  
Wenquan Cui

Heteroscedasticity often appears in the high-dimensional data analysis. In order to achieve a sparse dimension reduction direction for high-dimensional data with heteroscedasticity, we propose a new sparse sufficient dimension reduction method, called Lasso-PQR. From the candidate matrix derived from the principal quantile regression (PQR) method, we construct a new artificial response variable which is made up from top eigenvectors of the candidate matrix. Then we apply a Lasso regression to obtain sparse dimension reduction directions. While for the “large [Formula: see text] small [Formula: see text]” case that [Formula: see text], we use principal projection to solve the dimension reduction problem in a lower-dimensional subspace and projection back to the original dimension reduction problem. Theoretical properties of the methodology are established. Compared with several existing methods in the simulations and real data analysis, we demonstrate the advantages of our method in the high dimension data with heteroscedasticity.

Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 138-145
Author(s):  
Stephen Babos ◽  
Andreas Artemiou

In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments.


2018 ◽  
Vol 8 (2) ◽  
pp. 377-406
Author(s):  
Almog Lahav ◽  
Ronen Talmon ◽  
Yuval Kluger

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.


Biometrika ◽  
2021 ◽  
Author(s):  
Pixu Shi ◽  
Yuchen Zhou ◽  
Anru R Zhang

Abstract In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.


Sign in / Sign up

Export Citation Format

Share Document