Generalized kernel-based inverse regression methods for sufficient dimension reduction

2020 ◽  
Vol 150 ◽  
pp. 106995
Author(s):  
Chuanlong Xie ◽  
Lixing Zhu
Biostatistics ◽  
2019 ◽  
Author(s):  
Diego Tomassi ◽  
Liliana Forzani ◽  
Sabrina Duarte ◽  
Ruth M Pfeiffer

Summary Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Jae Keun Yoo

Abstract Sufficient dimension reduction (SDR) for a regression pursue a replacement of the original p-dimensional predictors with its lower-dimensional linear projection. The so-called sliced inverse regression (SIR; [5]) arguably has the longest history in SDR methodologies, but it is still one of the most popular one. The SIR is known to be easily affected by the number of slices, which is one of its critical deficits. Recently, a fused approach for SIR is proposed to relieve this weakness, which fuses the kernel matrices computed by the SIR application from various numbers of slices. In the paper, the fused SIR is applied to a large-p-small n regression of a high-dimensional microarray right-censored data to show its practical advantage over usual SIR application. Through model validation, it is confirmed that the fused SIR outperforms the SIR with any number of slices under consideration.


2013 ◽  
Vol 45 (03) ◽  
pp. 626-644
Author(s):  
Ondřej Šedivý ◽  
Jakub Stanek ◽  
Blažena Kratochvílová ◽  
Viktor Beneš

Dimension reduction of multivariate data was developed by Y. Guan for point processes with Gaussian random fields as covariates. The generalization to fibre and surface processes is straightforward. In inverse regression methods, we suggest slicing based on geometrical marks. An investigation of the properties of this method is presented in simulation studies of random marked sets. In a refined model for dimension reduction, the second-order central subspace is analyzed in detail. A real data pattern is tested for independence of a covariate.


2013 ◽  
Vol 45 (3) ◽  
pp. 626-644 ◽  
Author(s):  
Ondřej Šedivý ◽  
Jakub Stanek ◽  
Blažena Kratochvílová ◽  
Viktor Beneš

Dimension reduction of multivariate data was developed by Y. Guan for point processes with Gaussian random fields as covariates. The generalization to fibre and surface processes is straightforward. In inverse regression methods, we suggest slicing based on geometrical marks. An investigation of the properties of this method is presented in simulation studies of random marked sets. In a refined model for dimension reduction, the second-order central subspace is analyzed in detail. A real data pattern is tested for independence of a covariate.


Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 138-145
Author(s):  
Stephen Babos ◽  
Andreas Artemiou

In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments.


Sign in / Sign up

Export Citation Format

Share Document