Hypothesis Testing in Business Administration

The Difference

Hypothesis testing is an approach to statistical inference that is routinely taught and used. It is based on a simple idea: develop some relevant speculation about the population of individuals or things under study and determine whether data provide reasonably strong empirical evidence that the hypothesis is wrong. Consider, for example, two approaches to advertising a product. A study might be conducted to determine whether it is reasonable to assume that both approaches are equally effective. A Type I error is rejecting this speculation when in fact it is true. A Type II error is failing to reject when the speculation is false. A common practice is to test hypotheses with the type I error probability set to 0.05 and to declare that there is a statistically significant result if the hypothesis is rejected. There are various concerns about, limitations to, and criticisms of this approach. One criticism is the use of the term significant. Consider the goal of comparing the means of two populations of individuals. Saying that a result is significant suggests that the difference between the means is large and important. But in the context of hypothesis testing it merely means that there is empirical evidence that the means are not equal. Situations can and do arise where a result is declared significant, but the difference between the means is trivial and unimportant. Indeed, the goal of testing the hypothesis that two means are equal has been criticized based on the argument that surely the means differ at some decimal place. A simple way of dealing with this issue is to reformulate the goal. Rather than testing for equality, determine whether it is reasonable to make a decision about which group has the larger mean. The components of hypothesis-testing techniques can be used to address this issue with the understanding that the goal of testing some hypothesis has been replaced by the goal of determining whether a decision can be made about which group has the larger mean. Another aspect of hypothesis testing that has seen considerable criticism is the notion of a p-value. Suppose some hypothesis is rejected with the Type I error probability set to 0.05. This leaves open the issue of whether the hypothesis would be rejected with Type I error probability set to 0.025 or 0.01. A p-value is the smallest Type I error probability for which the hypothesis is rejected. When comparing means, a p-value reflects the strength of the empirical evidence that a decision can be made about which has the larger mean. A concern about p-values is that they are often misinterpreted. For example, a small p-value does not necessarily mean that a large or important difference exists. Another common mistake is to conclude that if the p-value is close to zero, there is a high probability of rejecting the hypothesis again if the study is replicated. The probability of rejecting again is a function of the extent that the hypothesis is not true, among other things. Because a p-value does not directly reflect the extent the hypothesis is false, it does not provide a good indication of whether a second study will provide evidence to reject it. Confidence intervals are closely related to hypothesis-testing methods. Basically, they are intervals that contain unknown quantities with some specified probability. For example, a goal might be to compute an interval that contains the difference between two population means with probability 0.95. Confidence intervals can be used to determine whether some hypothesis should be rejected. Clearly, confidence intervals provide useful information not provided by testing hypotheses and computing a p-value. But an argument for a p-value is that it provides a perspective on the strength of the empirical evidence that a decision can be made about the relative magnitude of the parameters of interest. For example, to what extent is it reasonable to decide whether the first of two groups has the larger mean? Even if a compelling argument can be made that p-values should be completely abandoned in favor of confidence intervals, there are situations where p-values provide a convenient way of developing reasonably accurate confidence intervals. Another argument against p-values is that because they are misinterpreted by some, they should not be used. But if this argument is accepted, it follows that confidence intervals should be abandoned because they are often misinterpreted as well. Classic hypothesis-testing methods for comparing means and studying associations assume sampling is from a normal distribution. A fundamental issue is whether nonnormality can be a source of practical concern. Based on hundreds of papers published during the last 50 years, the answer is an unequivocal Yes. Granted, there are situations where nonnormality is not a practical concern, but nonnormality can have a substantial negative impact on both Type I and Type II errors. Fortunately, there is a vast literature describing how to deal with known concerns. Results based solely on some hypothesis-testing approach have clear implications about methods aimed at computing confidence intervals. Nonnormal distributions that tend to generate outliers are one source for concern. There are effective methods for dealing with outliers, but technically sound techniques are not obvious based on standard training. Skewed distributions are another concern. The combination of what are called bootstrap methods and robust estimators provides techniques that are particularly effective for dealing with nonnormality and outliers. Classic methods for comparing means and studying associations also assume homoscedasticity. When comparing means, this means that groups are assumed to have the same amount of variance even when the means of the groups differ. Violating this assumption can have serious negative consequences in terms of both Type I and Type II errors, particularly when the normality assumption is violated as well. There is vast literature describing how to deal with this issue in a technically sound manner.

Privacy-Aware Distributed Hypothesis Testing

Entropy ◽

10.3390/e22060665 ◽

2020 ◽

Vol 22 (6) ◽

pp. 665 ◽

Cited By ~ 1

Author(s):

Sreejith Sreekumar ◽

Asaf Cohen ◽

Deniz Gündüz

Keyword(s):

Error Probability ◽

Type I Error ◽

Single Letter ◽

Type I ◽

Type Ii ◽

Type Ii Error ◽

Error Exponent ◽

Trade Offs ◽

Binary Hypothesis

A distributed binary hypothesis testing (HT) problem involving two parties, a remote observer and a detector, is studied. The remote observer has access to a discrete memoryless source, and communicates its observations to the detector via a rate-limited noiseless channel. The detector observes another discrete memoryless source, and performs a binary hypothesis test on the joint distribution of its own observations with those of the observer. While the goal of the observer is to maximize the type II error exponent of the test for a given type I error probability constraint, it also wants to keep a private part of its observations as oblivious to the detector as possible. Considering both equivocation and average distortion under a causal disclosure assumption as possible measures of privacy, the trade-off between the communication rate from the observer to the detector, the type II error exponent, and privacy is studied. For the general HT problem, we establish single-letter inner bounds on both the rate-error exponent-equivocation and rate-error exponent-distortion trade-offs. Subsequently, single-letter characterizations for both trade-offs are obtained (i) for testing against conditional independence of the observer’s observations from those of the detector, given some additional side information at the detector; and (ii) when the communication rate constraint over the channel is zero. Finally, we show by providing a counter-example where the strong converse which holds for distributed HT without a privacy constraint does not hold when a privacy constraint is imposed. This implies that in general, the rate-error exponent-equivocation and rate-error exponent-distortion trade-offs are not independent of the type I error probability constraint.

On the Performances of Trend and Change-Point Detection Methods for Remote Sensing Data

Remote Sensing ◽

10.3390/rs12061008 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1008 ◽

Cited By ~ 2

Author(s):

Ana Militino ◽

Mehdi Moradi ◽

M. Ugarte

Keyword(s):

Error Probability ◽

Type I Error ◽

Remote Sensing Data ◽

Change Point Detection ◽

Change Points ◽

Detection Methods ◽

Type I ◽

Power Of The Test ◽

Point Detection

Detecting change-points and trends are common tasks in the analysis of remote sensing data. Over the years, many different methods have been proposed for those purposes, including (modified) Mann–Kendall and Cox–Stuart tests for detecting trends; and Pettitt, Buishand range, Buishand U, standard normal homogeneity (Snh), Meanvar, structure change (Strucchange), breaks for additive season and trend (BFAST), and hierarchical divisive (E.divisive) for detecting change-points. In this paper, we describe a simulation study based on including different artificial, abrupt changes at different time-periods of image time series to assess the performances of such methods. The power of the test, type I error probability, and mean absolute error (MAE) were used as performance criteria, although MAE was only calculated for change-point detection methods. The study reveals that if the magnitude of change (or trend slope) is high, and/or the change does not occur in the first or last time-periods, the methods generally have a high power and a low MAE. However, in the presence of temporal autocorrelation, MAE raises, and the probability of introducing false positives increases noticeably. The modified versions of the Mann–Kendall method for autocorrelated data reduce/moderate its type I error probability, but this reduction comes with an important power diminution. In conclusion, taking a trade-off between the power of the test and type I error probability, we conclude that the original Mann–Kendall test is generally the preferable choice. Although Mann–Kendall is not able to identify the time-period of abrupt changes, it is more reliable than other methods when detecting the existence of such changes. Finally, we look for trend/change-points in land surface temperature (LST), day and night, via monthly MODIS images in Navarre, Spain, from January 2001 to December 2018.

Group sequential designs using a family of type I error probability spending functions

Statistics in Medicine ◽

10.1002/sim.4780091207 ◽

1990 ◽

Vol 9 (12) ◽

pp. 1439-1445 ◽

Cited By ~ 143

Author(s):

Irving K. Hwang ◽

Weichung J. Shih ◽

John S. De Cani

Keyword(s):

Error Probability ◽

Type I Error ◽

Type I ◽

Sequential Designs ◽

Group Sequential ◽

Group Sequential Designs ◽

Locally adaptive change-point detection (LACPD) with applications to environmental changes

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-021-02083-0 ◽

2021 ◽

Author(s):

Mehdi Moradi ◽

Manuel Montesino-SanMartin ◽

M. Dolores Ugarte ◽

Ana F. Militino

Keyword(s):

Time Series ◽

Change Point ◽

Error Probability ◽

Type I Error ◽

Change Point Detection ◽

Alternative Methods ◽

Change Points ◽

Type I ◽

Point Detection

AbstractWe propose an adaptive-sliding-window approach (LACPD) for the problem of change-point detection in a set of time-ordered observations. The proposed method is combined with sub-sampling techniques to compensate for the lack of enough data near the time series’ tails. Through a simulation study, we analyse its behaviour in the presence of an early/middle/late change-point in the mean, and compare its performance with some of the frequently used and recently developed change-point detection methods in terms of power, type I error probability, area under the ROC curves (AUC), absolute bias, variance, and root-mean-square error (RMSE). We conclude that LACPD outperforms other methods by maintaining a low type I error probability. Unlike some other methods, the performance of LACPD does not depend on the time index of change-points, and it generally has lower bias than other alternative methods. Moreover, in terms of variance and RMSE, it outperforms other methods when change-points are close to the time series’ tails, whereas it shows a similar (sometimes slightly poorer) performance as other methods when change-points are close to the middle of time series. Finally, we apply our proposal to two sets of real data: the well-known example of annual flow of the Nile river in Awsan, Egypt, from 1871 to 1970, and a novel remote sensing data application consisting of a 34-year time-series of satellite images of the Normalised Difference Vegetation Index in Wadi As-Sirham valley, Saudi Arabia, from 1986 to 2019. We conclude that LACPD shows a good performance in detecting the presence of a change as well as the time and magnitude of change in real conditions.

Fixed sized samples for type-specific surveillance of human papillomavirus in genital warts

Sexual Health ◽

10.1071/sh12056 ◽

2013 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Edward K. Waters ◽

Andrew J. Hamilton ◽

Anthony M. A. Smith ◽

David J. Philp ◽

Basil Donovan ◽

...

Keyword(s):

Human Papillomavirus ◽

Sample Size ◽

Error Probability ◽

Type I Error ◽

Hpv Vaccination ◽

Genital Warts ◽

Type I ◽

Post Vaccination ◽

Hpv Types ◽

Surveillance data suggest that human papillomavirus (HPV) vaccination in Australia is reducing the incidence of genital warts. However, existing surveillance measures do not assess the proportion of the remaining cases of warts that are caused by HPV types other than 6 or 11, against which the vaccine has no demonstrated effectiveness. Using computer simulation rather than sample size formulae, we established that genotyping at least 60 warts can accurately test whether the proportion of warts due to HPV types not targeted by the vaccine has increased (Type I error probability ≤0.05, Type II error probability <0.07). Standard formulae for calculating sample size, in contrast, suggest that a sample size of more than 130 would be required for this task, but using these formulae entails making several strong assumptions. Our methods require fewer assumptions and demonstrate that a smaller sample size than anticipated could be used to address the question of what proportion of post-vaccination cases of warts are due to nonvaccine types. In conjunction with indications of incidence and prevalence provided by existing surveillance measures, this could indicate the number of cases of post-vaccination warts due to nonvaccine types and hence whether type replacement is occurring.

Effects of Differential Genotyping Error Rate on the Type I Error Probability of Case-Control Studies

Human Heredity ◽

10.1159/000092553 ◽

2006 ◽

Vol 61 (1) ◽

pp. 55-64 ◽

Cited By ~ 60

Author(s):

Valentina Moskvina ◽

Nick Craddock ◽

Peter Holmans ◽

Michael J. Owen ◽

Michael C. O’Donovan

Keyword(s):

Error Rate ◽

Error Probability ◽

Type I Error ◽

Case Control ◽

Type I ◽

Genotyping Error ◽

Case Control Studies ◽

Genotyping Error Rate ◽

Effects of Number of Genetic Marker Alleles, Marker Interval, Degree of Substitution Effect and Paternal Half-sib Family Size on Type I Error Probability in the Test of the Presence of a QTL

Nihon Chikusan Gakkaiho ◽

10.2508/chikusan.68.911 ◽

1997 ◽

Vol 68 (10) ◽

pp. 911-916

Author(s):

Kenji TOGASHI ◽

Naoyuki YAMAMOTO ◽

Osamu SASAKI

Keyword(s):

Genetic Marker ◽

Family Size ◽

Error Probability ◽

Type I Error ◽

Substitution Effect ◽

Degree Of Substitution ◽

Type I ◽

Marker Interval ◽

Marker Alleles

Type I Error Probability Spending for Post-Market Drug and Vaccine Safety Surveillance With Poisson Data

Methodology And Computing In Applied Probability ◽

10.1007/s11009-017-9586-z ◽

2017 ◽

Vol 20 (2) ◽

pp. 739-750 ◽

Cited By ~ 3

Author(s):

Ivair R. Silva

Keyword(s):

Error Probability ◽

Type I Error ◽

Vaccine Safety ◽

Type I ◽

Safety Surveillance ◽

Post Market ◽

Poisson Data

Statistical Inference and Decision Making in Conservation Biology

Israel Journal of Ecology and Evolution ◽

10.1560/ijee.57.4.309 ◽

2010 ◽

Vol 57 (4) ◽

pp. 309-317 ◽

Cited By ~ 5

Author(s):

DAVID SALTZ

Keyword(s):

Decision Making ◽

Hypothesis Testing ◽

Conservation Biology ◽

Type I Error ◽

Alternative Hypothesis ◽

Type I ◽

Type Ii ◽

Type Ii Error ◽

Information Theoretic ◽

Model Inference

Since the formulation of hypothesis testing by Neyman and Pearson in 1933, the approach has been subject to continuous criticism. Yet, until recently this criticism, for the most part, has gone unheeded. The negative appraisal focuses mainly on the fact thatP-valuesprovide no evidential support for either the null hypothesis (H0) or the alternative hypothesis (Ha). Although hypothesis testing done under tightly controlled conditions can provide some insight regarding the alternative hypothesis based on the uncertainty ofH0, strictly speaking, this does not constitute evidence. More importantly, well controlled research environments rarely exist in field-centered sciences such as ecology. These problems are manifestly more acute in applied field sciences, such as conservation biology, that are expected to support decision making, often under crisis conditions. In conservation biology, the consequences of a Type II error are often far worse than a Type I error. The "advantage" afforded toH0by setting the probability of committing a Type I error (α) to a low value (0.05), in effect, increases the probability of committing a Type II error, which can lead to disastrous practical consequences. In the past decade, multi-model inference using information-theoretic or Bayesian approaches have been offered as better alternatives. These techniques allow comparing a series of models on equal grounds. Using these approaches, it is unnecessary to select a single "best" model. Rather, the parameters needed for decision making can be averaged across all models, weighted according to the support accorded each model. Here, I present a hypothetical example of animal counts that suggest a possible population decline, and analyze the data using hypothesis testing and an information-theoretic approach. A comparison between the two approaches highlights the shortcomings of hypothesis testing and advantages of multi-model inference.

Teacher’s Corner: A Note on Interpretation of the Paired-Samples t Test

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986022003349 ◽

1997 ◽

Vol 22 (3) ◽

pp. 349-360 ◽

Cited By ~ 24

Author(s):

Donald W. Zimmerman

Keyword(s):

Error Probability ◽

Degrees Of Freedom ◽

Type I Error ◽

T Test ◽

Experimental Designs ◽

Type I ◽

Significance Level ◽

Advantages And Disadvantages ◽

Paired Samples ◽

Explanations of advantages and disadvantages of paired-samples experimental designs in textbooks in education and psychology frequently overlook the change in the Type I error probability which occurs when an independent-samples t test is performed on correlated observations. This alteration of the significance level can be extreme even if the correlation is small. By comparison, the loss of power of the paired-samples t test on difference scores due to reduction of degrees of freedom, which typically is emphasized, is relatively slight. Although paired-samples designs are appropriate and widely used when there is a natural correspondence or pairing of scores, researchers have not often considered the implications of undetected correlation between supposedly independent samples in the absence of explicit pairing.