Biometrika
Latest Publications


TOTAL DOCUMENTS

10638
(FIVE YEARS 270)

H-INDEX

231
(FIVE YEARS 7)

Published By Oxford University Press

1464-3510, 0006-3444

Biometrika ◽  
2021 ◽  
Author(s):  
Y Cui ◽  
H Michael ◽  
F Tanser ◽  
E Tchetgen Tchetgen

Summary Robins (1998) introduced marginal structural models, a general class of counterfactual models for the joint effects of time-varying treatments in complex longitudinal studies subject to time-varying confounding. Robins (1998) established the identification of marginal structural model parameters under a sequential randomization assumption, which rules out unmeasured confounding of treatment assignment over time. The marginal structural Cox model is one of the most popular marginal structural models to evaluate the causal effect of time-varying treatments on a censored failure time outcome. In this paper, we establish sufficient conditions for identification of marginal structural Cox model parameters with the aid of a time-varying instrumental variable, when sequential randomization fails to hold due to unmeasured confounding. Our instrumental variable identification condition rules out any interaction between an unmeasured confounder and the instrumental variable in its additive effects on the treatment process, the longitudinal generalization of the identifying condition of Wang & Tchetgen Tchetgen (2018). We describe a large class of weighted estimating equations that give rise to consistent and asymptotically normal estimators of the marginal structural Cox model, thereby extending the standard inverse probability of treatment weighted estimation of marginal structural models to the instrumental variable setting. Our approach is illustrated via extensive simulation studies and an application to estimate the effect of community antiretroviral therapy coverage on HIV incidence.


Biometrika ◽  
2021 ◽  
Author(s):  
J Zapata ◽  
S Y Oh ◽  
A Petersen

Abstract The covariance structure of multivariate functional data can be highly complex, especially if the multivariate dimension is large, making extensions of statistical methods for standard multivariate data to the functional data setting challenging. For example, Gaussian graphical models have recently been extended to the setting of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. However, a key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. The methodology in this paper addresses the general problem of covariance modelling for multivariate functional data, and functional Gaussian graphical models in particular. As a first step, a new notion of separability for the covariance operator of multivariate functional data is proposed, termed partial separability, leading to a novel Karhunen–Loève-type expansion for such data. Next, the partial separability structure is shown to be particularly useful in order to provide a well-defined functional Gaussian graphical model that can be identified with a sequence of finite-dimensional graphical models, each of identical fixed dimension. This motivates a simple and efficient estimation procedure through application of the joint graphical lasso. Empirical performance of the method for graphical model estimation is assessed through simulation and analysis of functional brain connectivity during a motor task.


Biometrika ◽  
2021 ◽  
Author(s):  
Yuqian Zhang ◽  
Jelena Bradic

Abstract A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root-n. This is achieved by a novel k-fold cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models. We apply our methods to the heterogeneous treatment effects.


Biometrika ◽  
2021 ◽  
Author(s):  
J H loper ◽  
L Lei ◽  
W Fithian ◽  
W Tansey

Summary We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains.


Biometrika ◽  
2021 ◽  
Author(s):  
H Shi ◽  
M Drton ◽  
F Han

Abstract Chatterjee (2021+) introduced a simple new rank correlation coefficient that has attracted much recent attention. The coefficient has the unusual appeal that it not only estimates a population quantity first proposed by Dette et al. (2013) that is zero if and only if the underlying pair of random variables is independent, but also is asymptotically normal under independence. This paper compares Chatterjee’s new correlation coefficient to three established rank correlations that also facilitate consistent tests of independence, namely, Hoeffding’s D, Blum–Kiefer– Rosenblatt’s R, and Bergsma–Dassios–Yanagimoto’s τ *. We contrast their computational efficiency in light of recent advances, and investigate their power against local rotation and mixture alternatives. Our main results show that Chatterjee’s coefficient is unfortunately rate sub-optimal compared to D, R, and τ *. The situation is more subtle for a related earlier estimator of Dette et al. (2013). These results favor D, R, and τ * over Chatterjee’s new correlation coefficient for the purpose of testing independence.


Biometrika ◽  
2021 ◽  
Author(s):  
F Ferraty ◽  
S Nagy

Abstract It is common to want to regress a scalar response on a random function. This paper presents results that advocate local linear regression based on a projection as a nonparametric approach to this problem. Our asymptotic results demonstrate that functional local linear regression outperforms its functional local constant counterpart. Beyond the estimation of the regression operator itself, local linear regression is also a useful tool for predicting the functional derivative of the regression operator, a promising mathematical object on its own. The local linear estimator of the functional derivative is shown to be consistent. For both the estimator of the regression functional and the estimator of its derivative, theoretical properties are detailed. On simulated datasets we illustrate good finite sample properties of the proposed methods. On a real data example of a single-functional index model we indicate how the functional derivative of the regression operator provides an original, fast, and widely applicable estimation method.


Biometrika ◽  
2021 ◽  
Author(s):  
Pixu Shi ◽  
Yuchen Zhou ◽  
Anru R Zhang

Abstract In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.


Biometrika ◽  
2021 ◽  
Author(s):  
Joseph Guinness

Abstract We conduct a study of the aliased spectral densities of Matérn covariance functions on a regular grid of points, providing clarity on the properties of a popular approximation based on stochastic partial differential equations. While others have shown that it can approximate the covariance function well, we find that it assigns too much power at high frequencies and does not provide increasingly accurate approximations to the inverse as the grid spacing goes to zero, except in the one-dimensional exponential covariance case.


Sign in / Sign up

Export Citation Format

Share Document