Envelopes in multivariate regression models with nonlinearity and heteroscedasticity

Biometrika ◽  
2020 ◽  
Vol 107 (4) ◽  
pp. 965-981
Author(s):  
X Zhang ◽  
C E Lee ◽  
X Shao

Summary Envelopes have been proposed in recent years as a nascent methodology for sufficient dimension reduction and efficient parameter estimation in multivariate linear models. We extend the classical definition of envelopes in Cook et al. (2010) to incorporate a nonlinear conditional mean function and a heteroscedastic error. Given any two random vectors ${X}\in\mathbb{R}^{p}$ and ${Y}\in\mathbb{R}^{r}$, we propose two new model-free envelopes, called the martingale difference divergence envelope and the central mean envelope, and study their relationships to the standard envelope in the context of response reduction in multivariate linear models. The martingale difference divergence envelope effectively captures the nonlinearity in the conditional mean without imposing any parametric structure or requiring any tuning in estimation. Heteroscedasticity, or nonconstant conditional covariance of ${Y}\mid{X}$, is further detected by the central mean envelope based on a slicing scheme for the data. We reveal the nested structure of different envelopes: (i) the central mean envelope contains the martingale difference divergence envelope, with equality when ${Y}\mid{X}$ has a constant conditional covariance; and (ii) the martingale difference divergence envelope contains the standard envelope, with equality when ${Y}\mid{X}$ has a linear conditional mean. We develop an estimation procedure that first obtains the martingale difference divergence envelope and then estimates the additional envelope components in the central mean envelope. We establish consistency in envelope estimation of the martingale difference divergence envelope and central mean envelope without stringent model assumptions. Simulations and real-data analysis demonstrate the advantages of the martingale difference divergence envelope and the central mean envelope over the standard envelope in dimension reduction.

Biometrika ◽  
2020 ◽  
Author(s):  
C E Lee ◽  
X Zhang ◽  
X Shao

Summary We propose a new nonparametric conditional mean independence test for a response variable $Y$ and a predictor variable $X$ where either or both can be function-valued. Our test is built on a new metric, the so-called functional martingale difference divergence, which fully characterizes the conditional mean dependence of $Y$ given $X$ and extends the martingale difference divergence proposed by Shao & Zhang (2014). We define an unbiased estimator of functional martingale difference divergence by using a $\mathcal{U}$-centring approach, and we obtain its limiting null distribution under mild assumptions. Since the limiting null distribution is not pivotal, we use the wild bootstrap method to estimate the critical value and show the consistency of the bootstrap test. Our test can detect the local alternative which approaches the null at the rate of $n^{-1/2}$ with a nontrivial power, where $n$ is the sample size. Unlike the three tests developed by Kokoszka et al. (2008), Lei (2014) and Patilea et al. (2016), our test does not require a finite-dimensional projection or assume a linear model, and it does not involve any tuning parameters. Promising finite-sample performance is demonstrated via simulations, and a real-data illustration is used to compare our test with existing ones.


Genetics ◽  
2003 ◽  
Vol 165 (3) ◽  
pp. 1599-1605
Author(s):  
Fei Zou ◽  
Brian S Yandell ◽  
Jason P Fine

Abstract This article addresses the identification of genetic loci (QTL and elsewhere) that influence nonnormal quantitative traits with focus on experimental crosses. QTL mapping is typically based on the assumption that the traits follow normal distributions, which may not be true in practice. Model-free tests have been proposed. However, nonparametric estimation of genetic effects has not been studied. We propose an estimation procedure based on the linear rank test statistics. The properties of the new procedure are compared with those of traditional likelihood-based interval mapping and regression interval mapping via simulations and a real data example. The results indicate that the nonparametric method is a competitive alternative to the existing parametric methodologies.


Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 138-145
Author(s):  
Stephen Babos ◽  
Andreas Artemiou

In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments.


1985 ◽  
Vol 17 (2) ◽  
pp. 168-184 ◽  
Author(s):  
Julio Motta Singer ◽  
Pranab Kumar Sen

Author(s):  
Fiorella Pia Salvatore ◽  
Alessia Spada ◽  
Francesca Fortunato ◽  
Demetris Vrontis ◽  
Mariantonietta Fiore

The purpose of this paper is to investigate the determinants influencing the costs of cardiovascular disease in the regional health service in Italy’s Apulia region from 2014 to 2016. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were collected from the hospital discharge registry. Generalized linear models (GLM), and generalized linear mixed models (GLMM) were used to identify the role of random effects in improving the model performance. The study was based on socio-demographic variables and disease-specific variables (diagnosis-related group, hospitalization type, hospital stay, surgery, and economic burden of the hospital discharge form). Firstly, both models indicated an increase in health costs in 2016, and lower spending values for women (p < 0.001) were shown. GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Day-hospital has the lowest cost, surgery increases the cost, and AMI is the most expensive pathology, contrary to AF (p < 0.001). Secondly, AIC and BIC assume the lowest values for the GLMM model, indicating the random effects’ relevance in improving the model performance. This study is the first that considers real data to estimate the economic burden of CVD from the regional health service’s perspective. It appears significant for its ability to provide a large set of estimates of the economic burden of CVD, providing information to managers for health management and planning.


2021 ◽  
Author(s):  
Likai Chen ◽  
Ekaterina Smetanina ◽  
Wei Biao Wu

Abstract This paper presents a multiplicative nonstationary nonparametric regression model which allows for a broad class of nonstationary processes. We propose a three-step estimation procedure to uncover the conditional mean function and establish uniform convergence rates and asymptotic normality of our estimators. The new model can also be seen as a dimension-reduction technique for a general two-dimensional time-varying nonparametric regression model, which is especially useful in small samples and for estimating explicitly multiplicative structural models. We consider two applications: estimating a pricing equation for the US aggregate economy to model consumption growth, and estimating the shape of the monthly risk premium for S&P 500 Index data.


2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document