scholarly journals On Two Measure-Theoretic Aspects of the Full Bayesian Significance Test for Precise Bayesian Hypothesis Testing †

2021 ◽  
Vol 3 (1) ◽  
pp. 10
Author(s):  
Riko Kelter

The Full Bayesian Significance Test (FBST) has been proposed as a convenient method to replace frequentist p-values for testing a precise hypothesis. Although the FBST enjoys various appealing properties, the purpose of this paper is to investigate two aspects of the FBST which are sometimes observed as measure-theoretic inconsistencies of the procedure and have not been discussed rigorously in the literature. First, the FBST uses the posterior density as a reference for judging the Bayesian statistical evidence against a precise hypothesis. However, under absolutely continuous prior distributions, the posterior density is defined only up to Lebesgue null sets which renders the reference criterion arbitrary. Second, the FBST statistical evidence seems to have no valid prior probability. It is shown that the former aspect can be circumvented by fixing a version of the posterior density before using the FBST, and the latter aspect is based on its measure-theoretic premises. An illustrative example demonstrates the two aspects and their solution. Together, the results in this paper show that both of the two aspects which are sometimes observed as measure-theoretic inconsistencies of the FBST are not tenable. The FBST thus provides a measure-theoretically coherent Bayesian alternative for testing a precise hypothesis.

2020 ◽  
pp. 109634802094732
Author(s):  
A. George Assaf ◽  
Mike Tsionas

In hospitality and tourism research, p-values continue to be the most common approach to hypothesis testing. In this article, we elaborate on some of the misconceptions associated with p-values. We discuss the advantages of the Bayesian approach and provide several important practical recommendations and considerations for Bayesian hypothesis testing. With the main challenge of Bayesian hypothesis testing being the sensitivity of the results to prior distributions, we present in this article several priors that can be used for that purpose and illustrate their performance in a regression context.


2019 ◽  
Author(s):  
Don van Ravenzwaaij ◽  
Eric-Jan Wagenmakers

Tendeiro and Kiers (2019) provide a detailed and scholarly critique of Null Hypothesis Bayesian Testing (NHBT) and its central component –the Bayes factor– that allows researchers to update knowledge and quantify statistical evidence. Tendeiro and Kiers conclude that NHBT constitutes an improvement over frequentist p-values, but primarily elaborate on a list of eleven ‘issues’ of NHBT. In this commentary, we provide context to each issue and conclude that many issues may in fact be conceived as pronounced advantages of NHBT.


Author(s):  
Alexander Ly ◽  
Eric-Jan Wagenmakers

AbstractThe “Full Bayesian Significance Test e-value”, henceforth FBST ev, has received increasing attention across a range of disciplines including psychology. We show that the FBST ev leads to four problems: (1) the FBST ev cannot quantify evidence in favor of a null hypothesis and therefore also cannot discriminate “evidence of absence” from “absence of evidence”; (2) the FBST ev is susceptible to sampling to a foregone conclusion; (3) the FBST ev violates the principle of predictive irrelevance, such that it is affected by data that are equally likely to occur under the null hypothesis and the alternative hypothesis; (4) the FBST ev suffers from the Jeffreys-Lindley paradox in that it does not include a correction for selection. These problems also plague the frequentist p-value. We conclude that although the FBST ev may be an improvement over the p-value, it does not provide a reasonable measure of evidence against the null hypothesis.


Author(s):  
MONICA KRISTIANSEN ◽  
RUNE WINTHER ◽  
BENT NATVIG

Predicting the reliability of software systems based on a component-based approach is inherently difficult, in particular due to failure dependencies between software components. One possible way to assess and include dependency aspects in software reliability models is to find upper bounds for probabilities that software components fail simultaneously and then include these into the reliability models. In earlier research, it has been shown that including partial dependency information may give substantial improvements in predicting the reliability of compound software compared to assuming independence between all software components. Furthermore, it has been shown that including dependencies between pairs of data-parallel components may give predictions close to the system's true reliability. In this paper, a Bayesian hypothesis testing approach for finding upper bounds for probabilities that pairs of software components fail simultaneously is described. This approach consists of two main steps: (1) establishing prior probability distributions for probabilities that pairs of software components fail simultaneously and (2) updating these prior probability distributions by performing statistical testing. In this paper, the focus is on the first step in the Bayesian hypothesis testing approach, and two possible procedures for establishing a prior probability distribution for the probability that a pair of software components fails simultaneously are proposed.


2021 ◽  
Author(s):  
Alexander Ly ◽  
Eric-Jan Wagenmakers

he “Full Bayesian Significance Test e-value”, henceforth FBST ev, has received increasing attention across a range of disciplines including psychology. We show that the FBST ev leads to four problems: (1) the FBST ev cannot quantify evidence in favor of a null hypothesis and therefore also cannot discriminate “evidence of absence” from “absence of evidence”; (2) the FBST ev is susceptible to sampling to a foregone conclusion; (3) the FBST ev violates the principle of predictive irrelevance, such that it is affected by data that are equally likely to occur under the null hypothesis and the alternative hypothesis; (4) the FBST ev suffers from the Jeffreys-Lindley paradox in that it does not include a correction for selection. These problems also plague the frequentist p-value. We conclude that although the FBST ev may be an improvement over the p-value, it does not provide a reasonable measure of evidence against the null hypothesis.


Author(s):  
Helena Brentani ◽  
Eduardo Y Nakano ◽  
Camila B Martins ◽  
Rafael Izbicki ◽  
Carlos Alberto Pereira

Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern and Wechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.


Dose-Response ◽  
2017 ◽  
Vol 15 (2) ◽  
pp. 155932581771531
Author(s):  
Steven B. Kim ◽  
Nathan Sanders

For many dose–response studies, large samples are not available. Particularly, when the outcome of interest is binary rather than continuous, a large sample size is required to provide evidence for hormesis at low doses. In a small or moderate sample, we can gain statistical power by the use of a parametric model. It is an efficient approach when it is correctly specified, but it can be misleading otherwise. This research is motivated by the fact that data points at high experimental doses have too much contribution in the hypothesis testing when a parametric model is misspecified. In dose–response analyses, to account for model uncertainty and to reduce the impact of model misspecification, averaging multiple models have been widely discussed in the literature. In this article, we propose to average semiparametric models when we test for hormesis at low doses. We show the different characteristics of averaging parametric models and averaging semiparametric models by simulation. We apply the proposed method to real data, and we show that P values from averaged semiparametric models are more credible than P values from averaged parametric methods. When the true dose–response relationship does not follow a parametric assumption, the proposed method can be an alternative robust approach.


Sign in / Sign up

Export Citation Format

Share Document