scholarly journals Compositional knockoff filter for high-dimensional regression analysis of microbiome data

2019 ◽  
Author(s):  
Arun Srinivasan ◽  
Lingzhou Xue ◽  
Xiang Zhan

SummaryA critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine-mapping of the microbiome, we propose a two-step compositional knockoff filter (CKF) to provide the effective finite-sample false discovery rate (FDR) control in high-dimensional linear log-contrast regression analysis of microbiome compositional data. In the first step, we employ the compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum-to-zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high-dimensional microbial taxa as related to the response using a pre-specified FDR threshold. We study the asymptotic properties of the proposed two-step procedure, including both sure screening and effective false discovery control. We demonstrate the finite-sample properties in simulation studies, which show the gain in the empirical power while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease dataset to identify microbial taxa that influence host gene expressions.

2019 ◽  
Vol 7 (1) ◽  
pp. 394-417
Author(s):  
Aboubacrène Ag Ahmad ◽  
El Hadji Deme ◽  
Aliou Diop ◽  
Stéphane Girard

AbstractWe introduce a location-scale model for conditional heavy-tailed distributions when the covariate is deterministic. First, nonparametric estimators of the location and scale functions are introduced. Second, an estimator of the conditional extreme-value index is derived. The asymptotic properties of the estimators are established under mild assumptions and their finite sample properties are illustrated both on simulated and real data.


1998 ◽  
Vol 14 (2) ◽  
pp. 161-186 ◽  
Author(s):  
Laurence Broze ◽  
Olivier Scaillet ◽  
Jean-Michel Zakoïan

We discuss an estimation procedure for continuous-time models based on discrete sampled data with a fixed unit of time between two consecutive observations. Because in general the conditional likelihood of the model cannot be derived, an indirect inference procedure following Gouriéroux, Monfort, and Renault (1993, Journal of Applied Econometrics 8, 85–118) is developed. It is based on simulations of a discretized model. We study the asymptotic properties of this “quasi”-indirect estimator and examine some particular cases. Because this method critically depends on simulations, we pay particular attention to the appropriate choice of the simulation step. Finally, finite-sample properties are studied through Monte Carlo experiments.


2021 ◽  
Vol 2021 (026) ◽  
pp. 1-52
Author(s):  
Dong Hwan Oh ◽  
◽  
Andrew J. Patton ◽  

This paper proposes a dynamic multi-factor copula for use in high dimensional time series applications. A novel feature of our model is that the assignment of individual variables to groups is estimated from the data, rather than being pre-assigned using SIC industry codes, market capitalization ranks, or other ad hoc methods. We adapt the k-means clustering algorithm for use in our application and show that it has excellent finite-sample properties. Applying the new model to returns on 110 US equities, we find around 20 clusters to be optimal. In out-of-sample forecasts, we find that a model with as few as five estimated clusters significantly outperforms an otherwise identical model with 21 clusters formed using two-digit SIC codes.


2013 ◽  
Vol 5 (2) ◽  
pp. 133-162 ◽  
Author(s):  
Eric Hillebrand ◽  
Marcelo C. Medeiros ◽  
Junyue Xu

Abstract: We derive asymptotic properties of the quasi-maximum likelihood estimator of smooth transition regressions when time is the transition variable. The consistency of the estimator and its asymptotic distribution are examined. It is shown that the estimator converges at the usual -rate and has an asymptotically normal distribution. Finite sample properties of the estimator are explored in simulations. We illustrate with an application to US inflation and output data.


2017 ◽  
Vol 47 (1) ◽  
pp. 182-211 ◽  
Author(s):  
Arvid Sjölander

A popular way to reduce confounding in observational studies is to use each study participant as his or her own control. This is possible when both the exposure and the outcome are time varying and have been measured at several time points for each individual. The case-time-control method is a special case, which, under certain assumptions, allows the analyst to control for confounding by time-varying covariates, while controlling for all time-stationary characteristics of the study participants. There are two formulations of the case-time-control method. One formulation requires that the exposure be binary, and the other requires that there be no more than two time points per individual. In this article the author proposes a generalization of the case-time-control method for nonbinary exposures and an arbitrary number of time points. The author derives the asymptotic properties of the resulting estimator and assesses its finite sample properties in a simulation study.


2001 ◽  
Vol 9 (4) ◽  
pp. 379-384 ◽  
Author(s):  
Ethan Katz

Fixed-effects logit models can be useful in panel data analysis, when N units have been observed for T time periods. There are two main estimators for such models: unconditional maximum likelihood and conditional maximum likelihood. Judged on asymptotic properties, the conditional estimator is superior. However, the unconditional estimator holds several practical advantages, and therefore I sought to determine whether its use could be justified on the basis of finite-sample properties. In a series of Monte Carlo experiments for T < 20, I found a negligible amount of bias in both estimators when T ≥ 16, suggesting that a researcher can safely use either estimator under such conditions. When T < 16, the conditional estimator continued to have a very small amount of bias, but the unconditional estimator developed more bias as T decreased.


2009 ◽  
Vol 25 (1) ◽  
pp. 117-161 ◽  
Author(s):  
Marcelo C. Medeiros ◽  
Alvaro Veiga

In this paper a flexible multiple regime GARCH(1,1)-type model is developed to describe the sign and size asymmetries and intermittent dynamics in financial volatility. The results of the paper are important to other nonlinear GARCH models. The proposed model nests some of the previous specifications found in the literature and has the following advantages. First, contrary to most of the previous models, more than two limiting regimes are possible, and the number of regimes is determined by a simple sequence of tests that circumvents identification problems that are usually found in nonlinear time series models. The second advantage is that the novel stationarity restriction on the parameters is relatively weak, thereby allowing for rich dynamics. It is shown that the model may have explosive regimes but can still be strictly stationary and ergodic. A simulation experiment shows that the proposed model can generate series with high kurtosis and low first-order autocorrelation of the squared observations and exhibit the so-called Taylor effect, even with Gaussian errors. Estimation of the parameters is addressed, and the asymptotic properties of the quasi-maximum likelihood estimator are derived under weak conditions. A Monte-Carlo experiment is designed to evaluate the finite-sample properties of the sequence of tests. Empirical examples are also considered.


Author(s):  
Sudipta Das ◽  
Anup Dewanji ◽  
Subrata Kundu

The process of software testing usually involves the correction of a detected bug immediately upon detection. In this article, in contrast, we discuss continuous time testing of a software with periodic debugging in which bugs are corrected, instead of at the instants of their detection, at some pre-specified time points. Under the assumption of renewal distribution for the time between successive occurrence of a bug, maximum-likelihood estimation of the initial number of bugs in the software is considered, when the renewal distribution belongs to any general parametric family or is arbitrary. The asymptotic properties of the estimated model parameters are also discussed. Finally, we investigate the finite sample properties of the estimators, specially that of the number of initial number of bugs, through simulation.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 230
Author(s):  
Fang Xie ◽  
Johannes Lederer

Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity.


Sign in / Sign up

Export Citation Format

Share Document