scholarly journals Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Weixin Cai ◽  
Mark van der Laan

AbstractThe Highly-Adaptive least absolute shrinkage and selection operator (LASSO) Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functions (i.e., relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functions by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect when observing a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals.

Genetics ◽  
1998 ◽  
Vol 148 (1) ◽  
pp. 525-535
Author(s):  
Claude M Lebreton ◽  
Peter M Visscher

AbstractSeveral nonparametric bootstrap methods are tested to obtain better confidence intervals for the quantitative trait loci (QTL) positions, i.e., with minimal width and unbiased coverage probability. Two selective resampling schemes are proposed as a means of conditioning the bootstrap on the number of genetic factors in our model inferred from the original data. The selection is based on criteria related to the estimated number of genetic factors, and only the retained bootstrapped samples will contribute a value to the empirically estimated distribution of the QTL position estimate. These schemes are compared with a nonselective scheme across a range of simple configurations of one QTL on a one-chromosome genome. In particular, the effect of the chromosome length and the relative position of the QTL are examined for a given experimental power, which determines the confidence interval size. With the test protocol used, it appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome. When the QTL is closer to one end, the likelihood curve of its position along the chromosome becomes truncated, and the nonselective scheme then performs better inasmuch as the percentage of estimated confidence intervals that actually contain the real QTL's position is closer to expectation. The nonselective method, however, produces larger confidence intervals. Hence, we advocate use of the selective methods, regardless of the QTL position along the chromosome (to reduce confidence interval sizes), but we leave the problem open as to how the method should be altered to take into account the bias of the original estimate of the QTL's position.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Shirley X. Liao ◽  
Lucas Henneman ◽  
Cory Zigler

AbstractMarginal structural models (MSM) with inverse probability weighting (IPW) are used to estimate causal effects of time-varying treatments, but can result in erratic finite-sample performance when there is low overlap in covariate distributions across different treatment patterns. Modifications to IPW which target the average treatment effect (ATE) estimand either introduce bias or rely on unverifiable parametric assumptions and extrapolation. This paper extends an alternate estimand, the ATE on the overlap population (ATO) which is estimated on a sub-population with a reasonable probability of receiving alternate treatment patterns in time-varying treatment settings. To estimate the ATO within an MSM framework, this paper extends a stochastic pruning method based on the posterior predictive treatment assignment (PPTA) (Zigler, C. M., and M. Cefalu. 2017. “Posterior Predictive Treatment Assignment for Estimating Causal Effects with Limited Overlap.” eprint arXiv:1710.08749.) as well as a weighting analog (Li, F., K. L. Morgan, and A. M. Zaslavsky. 2018. “Balancing Covariates via Propensity Score Weighting.” Journal of the American Statistical Association 113: 390–400, https://doi.org/10.1080/01621459.2016.1260466.) to the time-varying treatment setting. Simulations demonstrate the performance of these extensions compared against IPW and stabilized weighting with regard to bias, efficiency, and coverage. Finally, an analysis using these methods is performed on Medicare beneficiaries residing across 18,480 ZIP codes in the U.S. to evaluate the effect of coal-fired power plant emissions exposure on ischemic heart disease (IHD) hospitalization, accounting for seasonal patterns that lead to change in treatment over time.


2013 ◽  
Vol 805-806 ◽  
pp. 1948-1951
Author(s):  
Tian Jin

The non-homogeneous Poisson model has been applied to various situations, including air pollution data. In this paper, we propose a kernel based nonparametric estimation for fitting the non-homogeneous Poisson process data. We show that our proposed estimator is-consistent and asymptotically normally distributed. We also study the finite-sample properties with a simulation study.


1997 ◽  
Vol 47 (11) ◽  
pp. 1197-1203 ◽  
Author(s):  
P. Steven Porter ◽  
S. Trivikrama Rao ◽  
Jia-Yeong Ku ◽  
Richard L. Poirot ◽  
Maxine Dakins

Hydrology ◽  
2020 ◽  
Vol 7 (3) ◽  
pp. 65
Author(s):  
Evangelos Rozos ◽  
Panayiotis Dimitriadis ◽  
Katerina Mazi ◽  
Spyridon Lykoudis ◽  
Antonis Koussis

Image velocimetry is a popular remote sensing method mainly because of the very modest cost of the necessary equipment. However, image velocimetry methods employ parameters that require high expertise to select appropriate values in order to obtain accurate surface flow velocity estimations. This introduces considerations regarding the subjectivity introduced in the definition of the parameter values and its impact on the estimated surface velocity. Alternatively, a statistical approach can be employed instead of directly selecting a value for each image velocimetry parameter. First, probability distribution should be defined for each model parameter, and then Monte Carlo simulations should be employed. In this paper, we demonstrate how this statistical approach can be used to simultaneously produce the confidence intervals of the estimated surface velocity, reduce the uncertainty of some parameters (more specifically, the size of the interrogation area), and reduce the subjectivity. Since image velocimetry algorithms are CPU-intensive, an alternative random number generator that allows obtaining the confidence intervals with a limited number of iterations is suggested. The case study indicated that if the statistical approach is applied diligently, one can achieve the previously mentioned threefold objective.


2012 ◽  
Vol 11 ◽  
pp. CIN.S9048 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.


2019 ◽  
Vol 29 (7) ◽  
pp. 1913-1934
Author(s):  
Jenny Jeyarajah ◽  
Guanhao Wei ◽  
Gengsheng Qin

In this paper, we propose empirical likelihood methods based on influence function and Jackknife techniques to construct confidence intervals for quantile medical costs with censored data. We show that the influence function-based empirical log-likelihood ratio statistic for the quantile medical cost has a standard Chi-square distribution as its asymptotic distribution. Simulation studies are conducted to compare coverage probabilities and interval lengths of the proposed empirical likelihood confidence intervals with the existing normal approximation-based confidence intervals for quantile medical costs. The proposed methods are observed to have better finite-sample performances than existing methods. The new methods are also illustrated through a real example.


Sign in / Sign up

Export Citation Format

Share Document