The Principle of Predictive Irrelevance, or Why Intervals Should Not be Used for Model Comparison Featuring a Point Null Hypothesis

Mapping Intimacies ◽

10.31234/osf.io/rqnu5 ◽

2019 ◽

Author(s):

Eric-Jan Wagenmakers ◽

Michael David Lee ◽

Jeffrey N. Rouder ◽

Richard Donald Morey

Keyword(s):

Null Hypothesis ◽

Model Comparison ◽

Broad Class ◽

Interval Estimation ◽

Credible Interval ◽

Estimation Methods ◽

Data Set ◽

Credible Intervals ◽

Point Null Hypothesis ◽

Normal Mean

The principle of predictive irrelevance states that when two competing models predict a data set equally well, that data set cannot be used to discriminate the models and --for that specific purpose-- the data set is evidentially irrelevant. To highlight the ramifications of the principle, we first show how a single binomial observation can be irrelevant in the sense that it carries no evidential value for discriminating the null hypothesis $\theta = 1/2$ from a broad class of alternative hypotheses that allow $\theta$ to be between 0 and 1. In contrast, the Bayesian credible interval suggest that a single binomial observation does provide some evidence against the null hypothesis. We then generalize this paradoxical result to infinitely long data sequences that are predictively irrelevant throughout. Examples feature a test of a binomial rate and a test of a normal mean. These maximally uninformative data (MUD) sequences yield credible intervals and confidence intervals that are certain to exclude the point of test as the sequence lengthens. The resolution of this paradox requires the insight that interval estimation methods --and, consequently, p values-- may not be used for model comparison involving a point null hypothesis.

Download Full-text

Conflicts in Bayesian Statistics Between Inference Based on Credible Intervals and Bayes Factors

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1556670540 ◽

2020 ◽

Vol 18 (1) ◽

pp. 2-27

Author(s):

Miodrag M. Lovric

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Bayes Factor ◽

Credible Interval ◽

Test Point ◽

Type I ◽

Null Hypothesis Testing ◽

Frequentist Statistics ◽

Credible Intervals ◽

Point Null Hypothesis

In frequentist statistics, point-null hypothesis testing based on significance tests and confidence intervals are harmonious procedures and lead to the same conclusion. This is not the case in the domain of the Bayesian framework. An inference made about the point-null hypothesis using Bayes factor may lead to an opposite conclusion if it is based on the Bayesian credible interval. Bayesian suggestions to test point-nulls using credible intervals are misleading and should be dismissed. A null hypothesized value may be outside a credible interval but supported by Bayes factor (a Type I conflict), or contrariwise, the null value may be inside a credible interval but not supported by the Bayes factor (Type II conflict). Two computer programs in R have been developed that confirm the existence of a countable infinite number of cases, for which Bayes credible intervals are not compatible with Bayesian hypothesis testing.

Download Full-text

Estimation of Multicomponent Reliability Based on Progressively Type II Censored Data from Unit Weibull Distribution

WSEAS TRANSACTIONS ON MATHEMATICS ◽

10.37394/23206.2021.20.30 ◽

2021 ◽

Vol 20 ◽

pp. 288-299

Author(s):

Refah Mohammed Alotaibi ◽

Yogesh Mani Tripathi ◽

Sanku Dey ◽

Hoda Ragab Rezk

Keyword(s):

Information Matrix ◽

Credible Interval ◽

Estimation Methods ◽

Type Ii ◽

Parametric Function ◽

Data Set ◽

Common Parameter ◽

Bayesian Procedures ◽

Mh Algorithm ◽

Type Ii Censored Data

In this paper, inference upon stress-strength reliability is considered for unit-Weibull distributions with a common parameter under the assumption that data are observed using progressive type II censoring. We obtain di_erent estimators of system reliability using classical and Bayesian procedures. Asymptotic interval is constructed based on Fisher information matrix. Besides, boot-p and boot-t intervals are also obtained. We evaluate Bayes estimates using Lindley's technique and Metropolis-Hastings (MH) algorithm. The Bayes credible interval is evaluated using MH method. An unbiased estimator of this parametric function is also obtained under know common parameter case. Numerical simulations are performed to compare estimation methods. Finally, a data set is studied for illustration purposes.

Download Full-text

The Principle of Predictive Irrelevance or Why Intervals Should Not be Used for Model Comparison Featuring a Point Null Hypothesis

The Theory of Statistics in Psychology ◽

10.1007/978-3-030-48043-1_8 ◽

2020 ◽

pp. 111-129

Author(s):

Eric-Jan Wagenmakers ◽

Michael D. Lee ◽

Jeffrey N. Rouder ◽

Richard D. Morey

Keyword(s):

Null Hypothesis ◽

Model Comparison ◽

Point Null Hypothesis

Download Full-text

Some Remarks on Rao and Lovric's 'Testing the Point Null Hypothesis of a Normal Mean and the Truth: 21st Century Perspective'

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1478001780 ◽

2016 ◽

Vol 15 (2) ◽

pp. 33-40 ◽

Cited By ~ 2

Author(s):

Bruce D. Zumbo ◽

Edward Kroc

Keyword(s):

21St Century ◽

Null Hypothesis ◽

Point Null Hypothesis ◽

Normal Mean

Download Full-text

Testing Point Null Hypothesis of a Normal Mean and the Truth: 21st Century Perspective

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1478001660 ◽

2016 ◽

Vol 15 (2) ◽

pp. 2-21 ◽

Cited By ~ 9

Author(s):

Calyampudi Radhakrishna Rao ◽

Miodrag M. Lovric

Keyword(s):

21St Century ◽

Null Hypothesis ◽

Point Null Hypothesis ◽

Normal Mean

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Detecting Heterogeneity of Substitution Along DNA and Protein Sequences

Genetics ◽

10.1093/genetics/143.1.589 ◽

1996 ◽

Vol 143 (1) ◽

pp. 589-602 ◽

Cited By ~ 2

Author(s):

Peter J E Goss ◽

R C Lewontin

Keyword(s):

Null Hypothesis ◽

Nonuniform Distribution ◽

Critical Values ◽

Third Order ◽

Variance Test ◽

Data Set ◽

Significance Level ◽

Uniformly Most Powerful Test ◽

Interval Lengths ◽

Uniformly Most Powerful

Abstract Regions of differing constraint, mutation rate or recombination along a sequence of DNA or amino acids lead to a nonuniform distribution of polymorphism within species or fixed differences between species. The power of five tests to reject the null hypothesis of a uniform distribution is studied for four classes of alternate hypothesis. The tests explored are the variance of interval lengths; a modified variance test, which includes covariance between neighboring intervals; the length of the longest interval; the length of the shortest third-order interval; and a composite test. Although there is no uniformly most powerful test over the range of alternate hypotheses tested, the variance and modified variance tests usually have the highest power. Therefore, we recommend that one of these two tests be used to test departure from uniformity in all circumstances. Tables of critical values for the variance and modified variance tests are given. The critical values depend both on the number of events and the number of positions in the sequence. A computer program is available on request that calculates both the critical values for a specified number of events and number of positions as well as the significance level of a given data set.

Download Full-text

A complete search for redshift z ≳ 6.5 quasars in the VIKING survey

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3808 ◽

2020 ◽

Vol 501 (2) ◽

pp. 1663-1676

Author(s):

R Barnett ◽

S J Warren ◽

N J G Cross ◽

D J Mortlock ◽

X Fan ◽

...

Keyword(s):

Model Comparison ◽

High Redshift ◽

Detailed Comparison ◽

Selection Function ◽

Data Set ◽

Bayesian Model Comparison ◽

Fitting Method ◽

Complete Search ◽

Potential Contaminant ◽

Ranked List

ABSTRACT We present the results of a new, deeper, and complete search for high-redshift 6.5 < z < 9.3 quasars over 977 deg2 of the VISTA Kilo-Degree Infrared Galaxy (VIKING) survey. This exploits a new list-driven data set providing photometry in all bands Z, Y, J, H, Ks, for all sources detected by VIKING in J. We use the Bayesian model comparison (BMC) selection method of Mortlock et al., producing a ranked list of just 21 candidates. The sources ranked 1, 2, 3, and 5 are the four known z > 6.5 quasars in this field. Additional observations of the other 17 candidates, primarily DESI Legacy Survey photometry and ESO FORS2 spectroscopy, confirm that none is a quasar. This is the first complete sample from the VIKING survey, and we provide the computed selection function. We include a detailed comparison of the BMC method against two other selection methods: colour cuts and minimum-χ2 SED fitting. We find that: (i) BMC produces eight times fewer false positives than colour cuts, while also reaching 0.3 mag deeper, (ii) the minimum-χ2 SED-fitting method is extremely efficient but reaches 0.7 mag less deep than the BMC method, and selects only one of the four known quasars. We show that BMC candidates, rejected because their photometric SEDs have high χ2 values, include bright examples of galaxies with very strong [O iii] λλ4959,5007 emission in the Y band, identified in fainter surveys by Matsuoka et al. This is a potential contaminant population in Euclid searches for faint z > 7 quasars, not previously accounted for, and that requires better characterization.

Download Full-text

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Entropy ◽

10.3390/e23010126 ◽

2021 ◽

Vol 23 (1) ◽

pp. 126

Author(s):

Sharu Theresa Jose ◽

Osvaldo Simeone

Keyword(s):

Learning Algorithm ◽

Broad Class ◽

Performance Measure ◽

Training Data ◽

Learning To Learn ◽

Data Set ◽

Information Theoretic ◽

Meta Learning ◽

Task Training ◽

Test Sets

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

Download Full-text

Properties and methods of estimation for a bivariate exponentiated Fréchet distribution

Mathematica Slovaca ◽

10.1515/ms-2017-0426 ◽

2020 ◽

Vol 70 (5) ◽

pp. 1211-1230

Author(s):

Abdus Saboor ◽

Hassan S. Bakouch ◽

Fernando A. Moala ◽

Sheraz Hussain

Keyword(s):

Maximum Likelihood ◽

Probability Density Function ◽

Probability Density ◽

Density Function ◽

Superior Performance ◽

Estimation Methods ◽

Conditional Probability Density ◽

Data Set ◽

Fréchet Distribution ◽

Frechet Distribution

AbstractIn this paper, a bivariate extension of exponentiated Fréchet distribution is introduced, namely a bivariate exponentiated Fréchet (BvEF) distribution whose marginals are univariate exponentiated Fréchet distribution. Several properties of the proposed distribution are discussed, such as the joint survival function, joint probability density function, marginal probability density function, conditional probability density function, moments, marginal and bivariate moment generating functions. Moreover, the proposed distribution is obtained by the Marshall-Olkin survival copula. Estimation of the parameters is investigated by the maximum likelihood with the observed information matrix. In addition to the maximum likelihood estimation method, we consider the Bayesian inference and least square estimation and compare these three methodologies for the BvEF. A simulation study is carried out to compare the performance of the estimators by the presented estimation methods. The proposed bivariate distribution with other related bivariate distributions are fitted to a real-life paired data set. It is shown that, the BvEF distribution has a superior performance among the compared distributions using several tests of goodness–of–fit.

Download Full-text