Uncertainty and Inference for Verification Measures

Ian T. Jolliffe

doi:10.1175/waf989.1

Uncertainty and Inference for Verification Measures

Weather and Forecasting ◽

10.1175/waf989.1 ◽

2007 ◽

Vol 22 (3) ◽

pp. 637-650 ◽

Cited By ~ 74

Author(s):

Ian T. Jolliffe

Keyword(s):

Hypothesis Testing ◽

Confidence Intervals ◽

Multiple Testing ◽

Prediction Intervals ◽

Hypothesis Tests ◽

Statistical Inferences ◽

The Difference ◽

Main Ideas

Abstract When a forecast is assessed, a single value for a verification measure is often quoted. This is of limited use, as it needs to be complemented by some idea of the uncertainty associated with the value. If this uncertainty can be quantified, it is then possible to make statistical inferences based on the value observed. There are two main types of inference: confidence intervals can be constructed for an underlying “population” value of the measure, or hypotheses can be tested regarding the underlying value. This paper will review the main ideas of confidence intervals and hypothesis tests, together with the less well known “prediction intervals,” concentrating on aspects that are often poorly understood. Comparisons will be made between different methods of constructing confidence intervals—exact, asymptotic, bootstrap, and Bayesian—and the difference between prediction intervals and confidence intervals will be explained. For hypothesis testing, multiple testing will be briefly discussed, together with connections between hypothesis testing, prediction intervals, and confidence intervals.

Download Full-text

Hypothesis Testing in Business Administration

Oxford Research Encyclopedia of Business and Management ◽

10.1093/acrefore/9780190224851.013.279 ◽

2020 ◽

Author(s):

Rand R. Wilcox

Keyword(s):

Hypothesis Testing ◽

Empirical Evidence ◽

Confidence Intervals ◽

Error Probability ◽

Type I Error ◽

Type I ◽

Type Ii ◽

Practical Concern ◽

Type I Error Probability ◽

The Difference

Hypothesis testing is an approach to statistical inference that is routinely taught and used. It is based on a simple idea: develop some relevant speculation about the population of individuals or things under study and determine whether data provide reasonably strong empirical evidence that the hypothesis is wrong. Consider, for example, two approaches to advertising a product. A study might be conducted to determine whether it is reasonable to assume that both approaches are equally effective. A Type I error is rejecting this speculation when in fact it is true. A Type II error is failing to reject when the speculation is false. A common practice is to test hypotheses with the type I error probability set to 0.05 and to declare that there is a statistically significant result if the hypothesis is rejected. There are various concerns about, limitations to, and criticisms of this approach. One criticism is the use of the term significant. Consider the goal of comparing the means of two populations of individuals. Saying that a result is significant suggests that the difference between the means is large and important. But in the context of hypothesis testing it merely means that there is empirical evidence that the means are not equal. Situations can and do arise where a result is declared significant, but the difference between the means is trivial and unimportant. Indeed, the goal of testing the hypothesis that two means are equal has been criticized based on the argument that surely the means differ at some decimal place. A simple way of dealing with this issue is to reformulate the goal. Rather than testing for equality, determine whether it is reasonable to make a decision about which group has the larger mean. The components of hypothesis-testing techniques can be used to address this issue with the understanding that the goal of testing some hypothesis has been replaced by the goal of determining whether a decision can be made about which group has the larger mean. Another aspect of hypothesis testing that has seen considerable criticism is the notion of a p-value. Suppose some hypothesis is rejected with the Type I error probability set to 0.05. This leaves open the issue of whether the hypothesis would be rejected with Type I error probability set to 0.025 or 0.01. A p-value is the smallest Type I error probability for which the hypothesis is rejected. When comparing means, a p-value reflects the strength of the empirical evidence that a decision can be made about which has the larger mean. A concern about p-values is that they are often misinterpreted. For example, a small p-value does not necessarily mean that a large or important difference exists. Another common mistake is to conclude that if the p-value is close to zero, there is a high probability of rejecting the hypothesis again if the study is replicated. The probability of rejecting again is a function of the extent that the hypothesis is not true, among other things. Because a p-value does not directly reflect the extent the hypothesis is false, it does not provide a good indication of whether a second study will provide evidence to reject it. Confidence intervals are closely related to hypothesis-testing methods. Basically, they are intervals that contain unknown quantities with some specified probability. For example, a goal might be to compute an interval that contains the difference between two population means with probability 0.95. Confidence intervals can be used to determine whether some hypothesis should be rejected. Clearly, confidence intervals provide useful information not provided by testing hypotheses and computing a p-value. But an argument for a p-value is that it provides a perspective on the strength of the empirical evidence that a decision can be made about the relative magnitude of the parameters of interest. For example, to what extent is it reasonable to decide whether the first of two groups has the larger mean? Even if a compelling argument can be made that p-values should be completely abandoned in favor of confidence intervals, there are situations where p-values provide a convenient way of developing reasonably accurate confidence intervals. Another argument against p-values is that because they are misinterpreted by some, they should not be used. But if this argument is accepted, it follows that confidence intervals should be abandoned because they are often misinterpreted as well. Classic hypothesis-testing methods for comparing means and studying associations assume sampling is from a normal distribution. A fundamental issue is whether nonnormality can be a source of practical concern. Based on hundreds of papers published during the last 50 years, the answer is an unequivocal Yes. Granted, there are situations where nonnormality is not a practical concern, but nonnormality can have a substantial negative impact on both Type I and Type II errors. Fortunately, there is a vast literature describing how to deal with known concerns. Results based solely on some hypothesis-testing approach have clear implications about methods aimed at computing confidence intervals. Nonnormal distributions that tend to generate outliers are one source for concern. There are effective methods for dealing with outliers, but technically sound techniques are not obvious based on standard training. Skewed distributions are another concern. The combination of what are called bootstrap methods and robust estimators provides techniques that are particularly effective for dealing with nonnormality and outliers. Classic methods for comparing means and studying associations also assume homoscedasticity. When comparing means, this means that groups are assumed to have the same amount of variance even when the means of the groups differ. Violating this assumption can have serious negative consequences in terms of both Type I and Type II errors, particularly when the normality assumption is violated as well. There is vast literature describing how to deal with this issue in a technically sound manner.

Download Full-text

Regression

10.1093/oso/9780198798170.003.0007 ◽

2021 ◽

pp. 71-84

Author(s):

Andy Hector

Keyword(s):

Linear Regression ◽

Hypothesis Testing ◽

Linear Model ◽

Confidence Intervals ◽

Linear Models ◽

Model Analysis ◽

Prediction Intervals ◽

Explanatory Variables ◽

Worked Example

This chapter extends the use of linear models to relationships with continuous explanatory variables, in other words, linear regression. The goal of the worked example (on timber hardness data) given in detail in this chapter is prediction, not hypothesis testing. Confidence intervals and prediction intervals are explained. Graphical approaches to checking the assumptions of linear-model analysis are explored in further detail. The effects of transformations on linearity, normality, and equality of variance are investigated.

Download Full-text

COMPARACION DEL PROMEDIO Y LA MEDIANA COMO ESTIMADORES DE LA MEDIA PARA MUESTRAS NORMALES DEPENDIENDO DEL TAMAÑO DE MUESTRA

Revista TECNIA ◽

10.21754/tecnia.v23i2.73 ◽

2017 ◽

Vol 23 (2) ◽

pp. 33

Author(s):

José W. Camero Jiménez ◽

Jahaziel G. Ponce Sánchez

Keyword(s):

Confidence Interval ◽

Sample Size ◽

Normal Distribution ◽

Confidence Intervals ◽

Hypothesis Tests ◽

Sample Mean ◽

Palabras Clave ◽

The Mean ◽

The Difference

Actualmente los métodos para estimar la media son los basados en el intervalo de confianza del promedio o media muestral. Este trabajo pretende ayudar a escoger el estimador (promedio o mediana) a usar dependiendo del tamaño de muestra. Para esto se han generado, vía simulación en excel, muestras con distribución normal y sus intervalos de confianza para ambos estimadores, y mediante pruebas de hipótesis para la diferencia de proporciones se demostrará que método es mejor dependiendo del tamaño de muestra. Palabras clave.-Tamaño de muestra, Intervalo de confianza, Promedio, Mediana. ABSTRACTCurrently the methods for estimating the mean are those based on the confidence interval of the average or sample mean. This paper aims to help you choose the estimator (average or median) to use depending on the sample size. For this we have generated, via simulation in EXCEL, samples with normal distribution and confidence intervals for both estimators, and by hypothesis tests for the difference of proportions show that method is better depending on the sample size. Keywords.-Sampling size, Confidence interval, Average, Median.

Download Full-text

Two-Condition Within-Participant Statistical Mediation Analysis: A Path-Analytic Framework

10.31234/osf.io/4su2j ◽

2019 ◽

Author(s):

Amanda Kay Montoya ◽

Andrew F. Hayes

Keyword(s):

Confidence Intervals ◽

Indirect Effect ◽

Mediation Analysis ◽

Analytic Approach ◽

Analytic Framework ◽

Hypothesis Tests ◽

Bootstrap Confidence Intervals ◽

Complex Models ◽

Path Analytic ◽

Statistical Mediation Analysis

Researchers interested in testing mediation often use designs where participants are measured on a dependent variable Y and a mediator M in both of two different circumstances. The dominant approach to assessing mediation in such a design, proposed by Judd, Kenny, and McClelland (2001), relies on a series of hypothesis tests about components of the mediation model and is not based on an estimate of or formal inference about the indirect effect. In this paper we recast Judd et al.’s approach in the path-analytic framework that is now commonly used in between-participant mediation analysis. By so doing, it is apparent how to estimate the indirect effect of a within-participant manipulation on some outcome through a mediator as the product of paths of influence. This path analytic approach eliminates the need for discrete hypothesis tests about components of the model to support a claim of mediation, as Judd et al’s method requires, because it relies only on an inference about the product of paths— the indirect effect. We generalize methods of inference for the indirect effect widely used in between-participant designs to this within-participant version of mediation analysis, including bootstrap confidence intervals and Monte Carlo confidence intervals. Using this path analytic approach, we extend the method to models with multiple mediators operating in parallel and serially and discuss the comparison of indirect effects in these more complex models. We offer macros and code for SPSS, SAS, and Mplus that conduct these analyses.

Download Full-text

Confidence intervals for the difference between two median survival times for clustered survival data

Journal of Applied Statistics ◽

10.1080/02664763.2016.1140730 ◽

2016 ◽

Vol 43 (12) ◽

pp. 2325-2345

Author(s):

Yu-Mei Chang ◽

Pao-Sheng Shen ◽

Guan-Wei Liu

Keyword(s):

Confidence Intervals ◽

Median Survival ◽

Survival Data ◽

Survival Times ◽

Clustered Survival Data ◽

The Difference

Download Full-text

Pitfalls of statistical hypothesis testing: multiple testing

BMJ ◽

10.1136/bmj.g5624 ◽

2014 ◽

Vol 349 (sep16 8) ◽

pp. g5624-g5624

Keyword(s):

Hypothesis Testing ◽

Multiple Testing ◽

Statistical Hypothesis ◽

Statistical Hypothesis Testing

Download Full-text

Classification accuracy comparison: Hypothesis tests and the use of confidence intervals in evaluations of difference, equivalence and non-inferiority

Remote Sensing of Environment ◽

10.1016/j.rse.2009.03.014 ◽

2009 ◽

Vol 113 (8) ◽

pp. 1658-1663 ◽

Cited By ~ 143

Author(s):

Giles M. Foody

Keyword(s):

Confidence Intervals ◽

Classification Accuracy ◽

Hypothesis Tests ◽

Accuracy Comparison

Download Full-text

Hypothesis tests and confidence intervals for means

Essential Statistics for Medical Practice ◽

10.1007/978-1-4899-4505-1_8 ◽

1994 ◽

pp. 127-141

Author(s):

D. G. Rees

Keyword(s):

Confidence Intervals ◽

Hypothesis Tests

Download Full-text

An Interpretation of Recent Research on Exchange Rate Target Zones

The Journal of Economic Perspectives ◽

10.1257/jep.6.4.119 ◽

1992 ◽

Vol 6 (4) ◽

pp. 119-144 ◽

Cited By ~ 140

Author(s):

Lars E. O Svensson

Keyword(s):

Exchange Rate ◽

Central Bank ◽

Exchange Rates ◽

Interest Rates ◽

Empirical Research ◽

Target Zones ◽

The Difference ◽

Exchange Rate Target Zones ◽

Main Ideas ◽

Central Bank Interventions

How do exchange rate bands work compared to completely fixed rates (between realignments); or, more precisely, what are the dynamics of exchange rates, interest rates, and central bank interventions within exchange rate bands? Does the difference between bands and completely fixed exchange rates matter, and if so, which of the two arrangements is best; or, more precisely, what are the tradeoffs that determine the optimal bandwidth? This article will present an interpretation of some selected recent theoretical and empirical research on exchange rate target zones, with emphasis on main ideas and results and without technical detail.

Download Full-text

Confidence intervals for a difference between lognormal means in cluster randomization trials

Statistical Methods in Medical Research ◽

10.1177/0962280214552291 ◽

2014 ◽

Vol 26 (2) ◽

pp. 598-614 ◽

Cited By ~ 1

Author(s):

Julia Poirier ◽

GY Zou ◽

John Koval

Keyword(s):

Confidence Intervals ◽

Community Acquired Pneumonia ◽

Small Sample ◽

Cluster Randomization ◽

Critical Pathway ◽

Arithmetic Means ◽

Multiple Parameters ◽

The Difference ◽

Using Data ◽

Small Sample Sizes

Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.

Download Full-text