naive estimator
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
QINGYUAN ZHAO ◽  
LUKE J KEELE ◽  
DYLAN S SMALL ◽  
MARSHALL M JOFFE

We discuss some causal estimands that are used to study racial discrimination in policing. A central challenge is that not all police–civilian encounters are recorded in administrative datasets and available to researchers. One possible solution is to consider the average causal effect of race conditional on the civilian already being detained by the police. We find that such an estimand can be quite different from the more familiar ones in causal inference and needs to be interpreted with caution. We propose using an estimand that is new for this context—the causal risk ratio, which has more transparent interpretation and requires weaker identification assumptions. We demonstrate this through a reanalysis of the NYPD Stop-and-Frisk dataset. Our reanalysis shows that the naive estimator that ignores the posttreatment selection in administrative records may severely underestimate the disparity in police violence between minorities and whites in these and similar data.


2021 ◽  
Vol 5 (1) ◽  
pp. 36
Author(s):  
Gian Marco Paldino ◽  
Jacopo De Stefani ◽  
Fabrizio De Caro ◽  
Gianluca Bontempi

The availability of massive amounts of temporal data opens new perspectives of knowledge extraction and automated decision making for companies and practitioners. However, learning forecasting models from data requires a knowledgeable data science or machine learning (ML) background and expertise, which is not always available to end-users. This gap fosters a growing demand for frameworks automating the ML pipeline and ensuring broader access to the general public. Automatic machine learning (AutoML) provides solutions to build and validate machine learning pipelines minimizing the user intervention. Most of those pipelines have been validated in static supervised learning settings, while an extensive validation in time series prediction is still missing. This issue is particularly important in the forecasting community, where the relevance of machine learning approaches is still under debate. This paper assesses four existing AutoML frameworks (AutoGluon, H2O, TPOT, Auto-sklearn) on a number of forecasting challenges (univariate and multivariate, single-step and multi-step ahead) by benchmarking them against simple and conventional forecasting strategies (e.g., naive and exponential smoothing). The obtained results highlight that AutoML approaches are not yet mature enough to address generic forecasting tasks once compared with faster yet more basic statistical forecasters. In particular, the tested AutoML configurations, on average, do not significantly outperform a Naive estimator. Those results, yet preliminary, should not be interpreted as a rejection of AutoML solutions in forecasting but as an encouragement to a more rigorous validation of their limits and perspectives.


2021 ◽  
Vol 31 (4) ◽  
pp. 1-36
Author(s):  
Ran Yang ◽  
David Kent ◽  
Daniel W. Apley ◽  
Jeremy Staum ◽  
David Ruppert

Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency of the nested simulation.


2020 ◽  
Vol 8 ◽  
pp. 409-422
Author(s):  
Alexander Clark ◽  
Nathanaël Fijalkow

Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning ( 1969 ). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016 ). We proceed via a reparameterization of (top–down) PCFGs that we call a bottom–up weighted context-free grammar. We show that if the grammar is anchored and satisfies additional restrictions on its ambiguity, then the parameters can be directly related to distributional properties of the anchoring strings; we show the asymptotic correctness of a naive estimator and present some simulations using synthetic data that show that algorithms based on this approach have good finite sample behavior.


2016 ◽  
Vol 113 (50) ◽  
pp. 14277-14282 ◽  
Author(s):  
Adeline Lo ◽  
Herman Chernoff ◽  
Tian Zheng ◽  
Shaw-Hwa Lo

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the I-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the I-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and I-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.


1999 ◽  
Vol 56 (7) ◽  
pp. 1234-1240
Author(s):  
W R Gould ◽  
L A Stefanski ◽  
K H Pollock

All catch-effort estimation methods implicitly assume catch and effort are known quantities, whereas in many cases, they have been estimated and are subject to error. We evaluate the application of a simulation-based estimation procedure for measurement error models (J.R. Cook and L.A. Stefanski. 1994. J. Am. Stat. Assoc. 89: 1314-1328) in catch-effort studies. The technique involves a simulation component and an extrapolation step, hence the name SIMEX estimation. We describe SIMEX estimation in general terms and illustrate its use with applications to real and simulated catch and effort data. Correcting for measurement error with SIMEX estimation resulted in population size and catchability coefficient estimates that were substantially less than naive estimates, which ignored measurement errors in some cases. In a simulation of the procedure, we compared estimators from SIMEX with "naive" estimators that ignore measurement errors in catch and effort to determine the ability of SIMEX to produce bias-corrected estimates. The SIMEX estimators were less biased than the naive estimators but in some cases were also more variable. Despite the bias reduction, the SIMEX estimator had a larger mean squared error than the naive estimator for one of two artificial populations studied. However, our results suggest the SIMEX estimator may outperform the naive estimator in terms of bias and precision for larger populations.


1996 ◽  
Vol 10 (2) ◽  
pp. 165-186 ◽  
Author(s):  
Peter W. Glynn ◽  
Eugene W. Wong

This paper is concerned with how coupling can be used to enhance the efficiency of a certain class of terminating simulations, in Markov process settings in which the stationary distribution is known. We are able to theoretically establish that our coupling-based estimator is often more efficient than the naive estimator. In addition, we discuss extensions of our methodology to Markov process settings in which conventional coupling fails and show (for Doeblin chains) that knowledge of the stationary distribution is sometimes unnecessary.


Metrika ◽  
1994 ◽  
Vol 41 (1) ◽  
pp. 55-64 ◽  
Author(s):  
Erkii P. Liski ◽  
Song-Gui Wang

Sign in / Sign up

Export Citation Format

Share Document