A Critical Evaluation of the FBST ev for Bayesian Hypothesis Testing

Mapping Intimacies ◽

10.31234/osf.io/x9t6n ◽

2021 ◽

Author(s):

Alexander Ly ◽

Eric-Jan Wagenmakers

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Alternative Hypothesis ◽

Critical Evaluation ◽

Significance Test ◽

P Value ◽

Bayesian Hypothesis Testing ◽

Absence Of Evidence ◽

Evidence Of Absence ◽

Foregone Conclusion

he “Full Bayesian Significance Test e-value”, henceforth FBST ev, has received increasing attention across a range of disciplines including psychology. We show that the FBST ev leads to four problems: (1) the FBST ev cannot quantify evidence in favor of a null hypothesis and therefore also cannot discriminate “evidence of absence” from “absence of evidence”; (2) the FBST ev is susceptible to sampling to a foregone conclusion; (3) the FBST ev violates the principle of predictive irrelevance, such that it is affected by data that are equally likely to occur under the null hypothesis and the alternative hypothesis; (4) the FBST ev suffers from the Jeffreys-Lindley paradox in that it does not include a correction for selection. These problems also plague the frequentist p-value. We conclude that although the FBST ev may be an improvement over the p-value, it does not provide a reasonable measure of evidence against the null hypothesis.

Download Full-text

A Critical Evaluation of the FBST ev for Bayesian Hypothesis Testing

Computational Brain & Behavior ◽

10.1007/s42113-021-00109-y ◽

2021 ◽

Author(s):

Alexander Ly ◽

Eric-Jan Wagenmakers

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Alternative Hypothesis ◽

Critical Evaluation ◽

Significance Test ◽

Bayesian Hypothesis Testing ◽

Absence Of Evidence ◽

Evidence Of Absence ◽

Foregone Conclusion

AbstractThe “Full Bayesian Significance Test e-value”, henceforth FBST ev, has received increasing attention across a range of disciplines including psychology. We show that the FBST ev leads to four problems: (1) the FBST ev cannot quantify evidence in favor of a null hypothesis and therefore also cannot discriminate “evidence of absence” from “absence of evidence”; (2) the FBST ev is susceptible to sampling to a foregone conclusion; (3) the FBST ev violates the principle of predictive irrelevance, such that it is affected by data that are equally likely to occur under the null hypothesis and the alternative hypothesis; (4) the FBST ev suffers from the Jeffreys-Lindley paradox in that it does not include a correction for selection. These problems also plague the frequentist p-value. We conclude that although the FBST ev may be an improvement over the p-value, it does not provide a reasonable measure of evidence against the null hypothesis.

Download Full-text

Hypothesis testing with error correction models

Political Science Research and Methods ◽

10.1017/psrm.2021.41 ◽

2021 ◽

pp. 1-9

Author(s):

Patrick W. Kraft ◽

Ellen M. Key ◽

Matthew J. Lebo

Keyword(s):

Hypothesis Testing ◽

Error Correction ◽

Null Hypothesis ◽

Alternative Hypothesis ◽

Correction Coefficient ◽

Correction Model ◽

Long Run ◽

Independent Variable ◽

The Right ◽

General Error

Abstract Grant and Lebo (2016) and Keele et al. (2016) clarify the conditions under which the popular general error correction model (GECM) can be used and interpreted easily: In a bivariate GECM the data must be integrated in order to rely on the error correction coefficient, $\alpha _1^\ast$ , to test cointegration and measure the rate of error correction between a single exogenous x and a dependent variable, y. Here we demonstrate that even if the data are all integrated, the test on $\alpha _1^\ast$ is misunderstood when there is more than a single independent variable. The null hypothesis is that there is no cointegration between y and any x but the correct alternative hypothesis is that y is cointegrated with at least one—but not necessarily more than one—of the x's. A significant $\alpha _1^\ast$ can occur when some I(1) regressors are not cointegrated and the equation is not balanced. Thus, the correct limiting distributions of the right-hand-side long-run coefficients may be unknown. We use simulations to demonstrate the problem and then discuss implications for applied examples.

Download Full-text

Asymptotic relative submajorization of multiple-state boxes

Letters in Mathematical Physics ◽

10.1007/s11005-021-01430-0 ◽

2021 ◽

Vol 111 (4) ◽

Author(s):

Gergely Bunth ◽

Péter Vrana

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Alternative Hypothesis ◽

Standard Form ◽

Point Of View ◽

State Discrimination ◽

Simple Alternative ◽

Multiple State ◽

Error Probabilities

AbstractPairs of states, or “boxes” are the basic objects in the resource theory of asymmetric distinguishability (Wang and Wilde in Phys Rev Res 1(3):033170, 2019. 10.1103/PhysRevResearch.1.033170), where free operations are arbitrary quantum channels that are applied to both states. From this point of view, hypothesis testing is seen as a process by which a standard form of distinguishability is distilled. Motivated by the more general problem of quantum state discrimination, we consider boxes of a fixed finite number of states and study an extension of the relative submajorization preorder to such objects. In this relation, a tuple of positive operators is greater than another if there is a completely positive trace nonincreasing map under which the image of the first tuple satisfies certain semidefinite constraints relative to the other one. This preorder characterizes error probabilities in the case of testing a composite null hypothesis against a simple alternative hypothesis, as well as certain error probabilities in state discrimination. We present a sufficient condition for the existence of catalytic transformations between boxes, and a characterization of an associated asymptotic preorder, both expressed in terms of sandwiched Rényi divergences. This characterization of the asymptotic preorder directly shows that the strong converse exponent for a composite null hypothesis is equal to the maximum of the corresponding exponents for the pairwise simple hypothesis testing tasks.

Download Full-text

The Influence of European Cup Performances on Domestic Stadium Attendances in Romanian Football

Proceedings of the International Conference on Business Excellence ◽

10.2478/picbe-2018-0078 ◽

2018 ◽

Vol 12 (1) ◽

pp. 875-884

Author(s):

Vlad I. Roșca

Keyword(s):

Null Hypothesis ◽

Alternative Hypothesis ◽

P Value ◽

Explanatory Variables ◽

National League ◽

European Football ◽

Widespread Belief ◽

Research Gap ◽

To Come ◽

Game Attendance

Abstract Widespread belief posits that a relationship exists between results obtained in European football competitions and live attendances at domestic league games. As part of the Europeanization process, international tournaments increasingly attract fans’ attention, often at the expense of national competitions, yet research up to date has focused on a wide array of explanatory variables for game attendance (spectator demand), but less on variables concerning how domestic teams perform in Europe. This article aims to fill the research gap by asking whether match attendances in national leagues can be predicted based on the results obtained by the domestic club teams in international competitions. UEFA team coefficients and domestic attendance figures for 74 European cup participations of Romanian teams spread over seventeen years from the 2000/2001 to the 2016/2017 season serve as input data for a regression model with an F-test and a p-value test. The Null Hypothesis instinctually claims no relationship exists between the variables, yet research results invalidate it for the good of the Alternative Hypothesis. The Discussions section presents what effects winning or losing in European cups can have on fans’ motivation to come and watch matches in the national league.

Download Full-text

Statistical Conclusion Validity

10.1093/oso/9780190661557.003.0006 ◽

2017 ◽

Author(s):

Richard McCleary ◽

David McDowall ◽

Bradley J. Bartos

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Hypothesis Test ◽

Model Misspecification ◽

Internal Validity ◽

Error Rates ◽

P Value ◽

Type I ◽

Null Hypothesis Testing ◽

Statistical Conclusion

Chapter 6 addresses the sub-category of internal validity defined by Shadish et al., as statistical conclusion validity, or “validity of inferences about the correlation (covariance) between treatment and outcome.” The common threats to statistical conclusion validity can arise, or become plausible through either model misspecification or through hypothesis testing. The risk of a serious model misspecification is inversely proportional to the length of the time series, for example, and so is the risk of mistating the Type I and Type II error rates. Threats to statistical conclusion validity arise from the classical and modern hybrid significance testing structures, the serious threats that weigh heavily in p-value tests are shown to be undefined in Beyesian tests. While the particularly vexing threats raised by modern null hypothesis testing are resolved through the elimination of the modern null hypothesis test, threats to statistical conclusion validity would inevitably persist and new threats would arise.

Download Full-text

Interval estimation and inference

Statistical Thinking from Scratch ◽

10.1093/oso/9780198827627.003.0009 ◽

2019 ◽

pp. 111-138

Author(s):

M. D. Edge

Keyword(s):

Standard Deviation ◽

Hypothesis Testing ◽

Confidence Intervals ◽

Null Hypothesis ◽

Alternative Hypothesis ◽

Interval Estimation ◽

Research Literature ◽

Significance Level ◽

Specific Hypothesis ◽

True Value

Interval estimation is the attempt to define intervals that quantify the degree of uncertainty in an estimate. The standard deviation of an estimate is called a standard error. Confidence intervals are designed to cover the true value of an estimand with a specified probability. Hypothesis testing is the attempt to assess the degree of evidence for or against a specific hypothesis. One tool for frequentist hypothesis testing is the p value, or the probability that if the null hypothesis is in fact true, the data would depart as extremely or more extremely from expectations under the null hypothesis than they were observed to do. In Neyman–Pearson hypothesis testing, the null hypothesis is rejected if p is less than a pre-specified value, often chosen to be 0.05. A test’s power function gives the probability that the null hypothesis is rejected given the significance level γ‎, a sample size n, and a specified alternative hypothesis. This chapter discusses some limitations of hypothesis testing as commonly practiced in the research literature.

Download Full-text

Utility Theory as a Method to Minimise the Risk in Deformation Analysis Decisions

Journal of Applied Geodesy ◽

10.1515/jag-2014-0012 ◽

2014 ◽

Vol 8 (4) ◽

Author(s):

Yin Zhang ◽

Ingo Neumann

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Utility Theory ◽

Alternative Hypothesis ◽

Main Idea ◽

Deformation Monitoring ◽

Deformation Analysis ◽

Utility Values ◽

Alternative Hypotheses ◽

The Given

AbstractDeformation monitoring usually focuses on the detection of whether the monitored objects satisfy the given properties (e.g. being stable or not), and makes further decisions to minimise the risks, for example, the consequences and costs in case of collapse of artificial objects and/or natural hazards. With this intention, a methodology relying on hypothesis testing and utility theory is reviewed in this paper. The main idea of utility theory is to judge each possible outcome with a utility value. The presented methodology makes it possible to minimise the risk of an individual monitoring project by considering the costs and consequences of overall possible situations within the decision process. It is not the danger that the monitored object may collapse that can be reduced. The risk (based on the utility values multiplied by the danger) can be described more appropriately and therefore more valuable decisions can be made. Especially, the opportunity for the measurement process to minimise the risk is an important key issue. In this paper, application of the methodology to two of the classical cases in hypothesis testing will be discussed in detail: 1) both probability density functions (pdfs) of tested objects under null and alternative hypotheses are known; 2) only the pdf under the null hypothesis is known and the alternative hypothesis is treated as the pure negation of the null hypothesis. Afterwards, a practical example in deformation monitoring is introduced and analysed. Additionally, the way in which the magnitudes of utility values (consequences of a decision) influence the decision will be considered and discussed at the end.

Download Full-text

A Primer on p-Value Thresholds and α-Levels – Two Different Kettles of Fish

German Journal of Agricultural Economics ◽

10.30430/70.2021.2.123-133 ◽

2021 ◽

Vol 70 (2) ◽

pp. 123-133

Author(s):

Norbert Hirschauer ◽

Sven Grüner ◽

Oliver Mußhoff ◽

Claudia Becker

Keyword(s):

Hypothesis Testing ◽

Statistical Inference ◽

Null Hypothesis ◽

Significance Testing ◽

Future Research ◽

P Value ◽

Null Hypothesis Significance Testing ◽

Realistic Assessment

It has often been noted that the “null-hypothesis-significance-testing” (NHST) framework is an inconsistent hybrid of Neyman-Pearson’s “hypothesis testing” and Fisher’s “significance testing” that almost inevitably causes misinterpretations. To facilitate a realistic assessment of the potential and the limits of statistical inference, we briefly recall widespread inferential errors and outline the two original approaches of these famous statisticians. Based on the understanding of their irreconcilable perspectives, we propose “going back to the roots” and using the initial evidence in the data in terms of the size and the uncertainty of the estimate for the purpose of statistical inference. Finally, we make six propositions that hopefully contribute to improving the quality of inferences in future research.

Download Full-text

Hypothesis Testing

Data Analysis for Chemistry ◽

10.1093/oso/9780195162103.003.0008 ◽

2005 ◽

Author(s):

D. Brynn Hibbert ◽

J. Justin Gooding

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Hypothesis Test ◽

Significance Test ◽

Type I ◽

Test Statistic ◽

The Difference ◽

Set Up ◽

Type Ii Errors ◽

Normally Distributed

• To understand the concept of the null hypothesis and the role of Type I and Type II errors. • To test that data are normally distributed and whether a datum is an outlier. • To determine whether there is systematic error in the mean of measurement results. • To perform tests to compare the means of two sets of data.… One of the uses to which data analysis is put is to answer questions about the data, or about the system that the data describes. In the former category are ‘‘is the data normally distributed?’’ and ‘‘are there any outliers in the data?’’ (see the discussions in chapter 1). Questions about the system might be ‘‘is the level of alcohol in the suspect’s blood greater than 0.05 g/100 mL?’’ or ‘‘does the new sensor give the same results as the traditional method?’’ In answering these questions we determine the probability of finding the data given the truth of a stated hypothesis—hence ‘‘hypothesis testing.’’ A hypothesis is a statement that might, or might not, be true. Usually the hypothesis is set up in such a way that it is possible to calculate the probability (P) of the data (or the test statistic calculated from the data) given the hypothesis, and then to make a decision about whether the hypothesis is to be accepted (high P) or rejected (low P). A particular case of a hypothesis test is one that determines whether or not the difference between two values is significant—a significance test. For this case we actually put forward the hypothesis that there is no real difference and the observed difference arises from random effects: it is called the null hypothesis (H<sub>0</sub>). If the probability that the data are consistent with the null hypothesis falls below a predetermined low value (say 0.05 or 0.01), then the hypothesis is rejected at that probability. Therefore, p<0.05 means that if the null hypothesis were true we would find the observed data (or more accurately the value of the statistic, or greater, calculated from the data) in less than 5% of repeated experiments.

Download Full-text

Bayesian Modeling of Working Memory and Inhibitory Control

International Journal of Psychological Studies ◽

10.5539/ijps.v10n4p53 ◽

2018 ◽

Vol 10 (4) ◽

pp. 53

Author(s):

Héctor A. Cepeda-Freyre ◽

Gregorio Garcia-Aguilar ◽

J. Jacobo Oliveros-Oliveros

Keyword(s):

Working Memory ◽

Hypothesis Testing ◽

Inhibitory Control ◽

Present Report ◽

Memory Function ◽

P Value ◽

Memory And Attention ◽

Bayesian Hypothesis Testing ◽

Null Hypothesis Testing ◽

Attentional Interference

In cognitive science, working memory is a core cognitive ability that might be functionally related to other capacities, such as perceptual processes, inhibitory control, memory and attention processes and executive functions. The mathematical study of working memory has been explored before. However, there is not enough research aiming to study the relationship between working memory and inhibitory control. This is the objective of the present report. Bayesian hypothesis testing is often more robust than traditional p-value null hypothesis testing. Yet, the number of studies using this approach is still limited. A secondary objective of this paper is to contribute to fill that gap, as well as provide an empirical application of Bayesian hypothesis testing using cognitive and behavioral data. A within-subjects design was used to measure working memory function for three types of visual stimuli that varied in the degree of attentional interference they were designed to elicit. Data collected was contrasted with measurements of inhibitory control and analyzed using Bayes’ theorem. Our results provide evidence against the theoretical relationship of working memory and inhibitory control. This outcome is analyzed in light of related cognitive research.

Download Full-text