Statistical power and design requirements for environmental monitoring

1991 ◽  
Vol 42 (5) ◽  
pp. 555 ◽  
Author(s):  
PG Fairweather

This paper discusses, from a philosophical perspective, the reasons for considering the power of any statistical test used in environmental biomonitoring. Power is inversely related to the probability of making a Type II error (i.e. low power indicates a high probability of Type II error). In the context of environmental monitoring, a Type II error is made when it is concluded that no environmental impact has occurred even though one has. Type II errors have been ignored relative to Type I errors (the mistake of concluding that there is an impact when one has not occurred), the rates of which are stipulated by the a values of the test. In contrast, power depends on the value of α, the sample size used in the test, the effect size to be detected, and the variability inherent in the data. Although power ideas have been known for years, only recently have these issues attracted the attention of ecologists and have methods been available for calculating power easily. Understanding statistical power gives three ways to improve environmental monitoring and to inform decisions about actions arising from monitoring. First, it allows the most sensitive tests to be chosen from among those applicable to the data. Second, preliminary power analysis can be used to indicate the sample sizes necessary to detect an environmental change. Third, power analysis should be used after any nonsignificant result is obtained in order to judge whether that result can be interpreted with confidence or the test was too weak to examine the null hypothesis properly. Power procedures are concerned with the statistical significance of tests of the null hypothesis, and they lend little insight, on their own, into the workings of nature. Power analyses are, however, essential to designing sensitive tests and correctly interpreting their results. The biological or environmental significance of any result, including whether the impact is beneficial or harmful, is a separate issue. The most compelling reason for considering power is that Type II errors can be more costly than Type I errors for environmental management. This is because the commitment of time, energy and people to fighting a false alarm (a Type I error) may continue only in the short term until the mistake is discovered. In contrast, the cost of not doing something when in fact it should be done (a Type II error) will have both short- and long-term costs (e.g. ensuing environmental degradation and the eventual cost of its rectification). Low power can be disastrous for environmental monitoring programmes.

2018 ◽  
Vol 108 (1) ◽  
pp. 15-22 ◽  
Author(s):  
David H. Gent ◽  
Paul D. Esker ◽  
Alissa B. Kriss

In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 – (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even indicate some studies should not be undertaken. However, the conclusions of adequately powered studies are less prone to erroneous conclusions and inflated estimates of treatment effectiveness, especially when effect sizes are small.


1996 ◽  
Vol 1 (1) ◽  
pp. 25-28 ◽  
Author(s):  
Martin A. Weinstock

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.


1990 ◽  
Vol 15 (3) ◽  
pp. 237-247 ◽  
Author(s):  
Rand R. Wilcox

Let X and Y be dependent random variables with variances σ2x and σ2y. Recently, McCulloch (1987) suggested a modification of the Morgan-Pitman test of Ho: σ2x=σ2y But, as this paper describes, there are situations where McCulloch’s procedure is not robust. A subsample approach, similar to the Box-Scheffe test, is also considered and found to give conservative results, in terms of Type I errors, for all situations considered, but it yields relatively low power. New results on the Sandvik-Olsson procedure are also described, but the procedure is found to be nonrobust in situations not previously considered, and its power can be low relative to the two other techniques considered here. A modification of the Morgan-Pitman test based on the modified maximum likelihood estimate of a correlation is also considered. This last procedure appears to be robust in situations where the Sandvik-Olsson (1982) and McCulloch procedures are robust, and it can have more power than the Sandvik-Olsson. But it too gives unsatisfactory results in certain situations. Thus, in terms of power, McCulloch’s procedure is found to be best, with the advantage of being simple to use. But, it is concluded that, in terms of controlling both Type I and Type II errors, a satisfactory solution does not yet exist.


2019 ◽  
Vol 8 (4) ◽  
pp. 1849-1853

Nowadays people are interested to avail loans in banks for their needs, but providing loans to all people is not possible to banks, so they are using some measures to identify eligible customers. To measure the performance of categorical variables sensitivity and specificity are widely used in Medical and tangentially in econometrics, after using some measures also if banks provide the loans to the wrong customers whom might not able to repay the loans, and not providing to customers who can repay will lead to the type I errors and type II errors, to minimize these errors, this study explains one, how to know sensitivity is large or small and second to study the bench marks on forecasting the model by Fuzzy analysis based on fuzzy based weights and it is compared with the sensitivity analysis.


2019 ◽  
Vol 100 (10) ◽  
pp. 1987-2007 ◽  
Author(s):  
Thomas Knutson ◽  
Suzana J. Camargo ◽  
Johnny C. L. Chan ◽  
Kerry Emanuel ◽  
Chang-Hoi Ho ◽  
...  

AbstractAn assessment was made of whether detectable changes in tropical cyclone (TC) activity are identifiable in observations and whether any changes can be attributed to anthropogenic climate change. Overall, historical data suggest detectable TC activity changes in some regions associated with TC track changes, while data quality and quantity issues create greater challenges for analyses based on TC intensity and frequency. A number of specific published conclusions (case studies) about possible detectable anthropogenic influence on TCs were assessed using the conventional approach of preferentially avoiding type I errors (i.e., overstating anthropogenic influence or detection). We conclude there is at least low to medium confidence that the observed poleward migration of the latitude of maximum intensity in the western North Pacific is detectable, or highly unusual compared to expected natural variability. Opinion on the author team was divided on whether any observed TC changes demonstrate discernible anthropogenic influence, or whether any other observed changes represent detectable changes. The issue was then reframed by assessing evidence for detectable anthropogenic influence while seeking to reduce the chance of type II errors (i.e., missing or understating anthropogenic influence or detection). For this purpose, we used a much weaker “balance of evidence” criterion for assessment. This leads to a number of more speculative TC detection and/or attribution statements, which we recognize have substantial potential for being false alarms (i.e., overstating anthropogenic influence or detection) but which may be useful for risk assessment. Several examples of these alternative statements, derived using this approach, are presented in the report.


2016 ◽  
Vol 105 (6) ◽  
pp. 605-609 ◽  
Author(s):  
Anthony K. Akobeng

1993 ◽  
Vol 76 (2) ◽  
pp. 407-412 ◽  
Author(s):  
Donald W. Zimmerman

This study investigated violations of random sampling and random assignment in data analyzed by nonparametric significance tests. A computer program induced correlations within groups, as well as between groups, and performed one-sample and two-sample versions of the Mann-Whitney-Wilcoxon test on the resulting scores. Nonindependence of observations within groups spuriously inflated the probability of Type I errors and depressed the probability of Type II errors, and nonindependence between groups had the reverse effect. This outcome, which parallels the influence of nonindependence on parametric tests, can be explained by the equivalence of the Mann-Whitney-Wilcoxon test and the Student t test performed on ranks replacing the initial scores.


2021 ◽  
Author(s):  
Antonia Vehlen ◽  
William Standard ◽  
Gregor Domes

Advances in eye tracking technology have enabled the development of interactive experimental setups to study social attention. Since these setups differ substantially from the eye tracker manufacturer’s test conditions, validation is essential with regard to data quality and other factors potentially threatening data validity. In this study, we evaluated the impact of data accuracy and areas of interest (AOIs) size on the classification of simulated gaze data. We defined AOIs of different sizes using the Limited-Radius Voronoi-Tessellation (LRVT) method, and simulated gaze data for facial target points with varying data accuracy. As hypothesized, we found that data accuracy and AOI size had strong effects on gaze classification. In addition, these effects were not independent and differed for falsely classified gaze inside AOIs (Type I errors) and falsely classified gaze outside the predefined AOIs (Type II errors). The results indicate that smaller AOIs generally minimize false classifications as long as data accuracy is good enough. For studies with lower data accuracy, Type II errors can still be compensated to some extent by using larger AOIs, but at the cost of an increased probability of Type I errors. Proper estimation of data accuracy is therefore essential for making informed decisions regarding the size of AOIs.


Author(s):  
Zsuzsanna Győri

A cikkben a szerző a piac és a kormányzat kudarcaiból kiindulva azonosítja a közjó elérését célzó harmadik rendszer, az etikai felelősség kudarcait. Statisztikai analógiát használva elsőfajú kudarcként azonosítja, mikor az etikát nem veszik figyelembe, pedig szükség van rá. Ugyanakkor másodfajú kudarcként kezeli az etika profitnövelést célzó használatát, mely megtéveszti az érintetteteket, így még szélesebb utat enged az opportunista üzleti tevékenységnek. Meglátása szerint a három rendszer egymást nemcsak kiegészíti, de kölcsönösen korrigálja is. Ez az elsőfajú kudarc esetében általánosabb, a másodfajú kudarc megoldásához azonban a gazdasági élet alapvetéseinek átfogalmazására, az önérdek és az egydimenziós teljesítményértékelés helyett egy új, holisztikusabb szemléletű közgazdaságra van szükség. _______ In the article the author identifies the errors of ethical responsibility. That is the third system to attain common good, but have similar failures like the other two: the hands of the market and the government. Using statistical analogy the author identifies Type I error when ethics are not considered but it should be (null hypothesis is rejected however it’s true). She treats the usage of ethics to extend profit as Type II error. This misleads the stakeholders and makes room for opportunistic behaviour in business (null hypothesis is accepted in turn it’s false). In her opinion the three systems: the hand of the market, the government and the ethical management not only amend but interdependently correct each other. In the case of Type I error it is more general. Nevertheless to solve the Type II error we have to redefine the core principles of business. We need a more holistic approach in economics instead of self-interest and one-dimensional interpretation of value.


1997 ◽  
Vol 273 (1) ◽  
pp. H487-H493 ◽  
Author(s):  
J. L. Williams ◽  
C. A. Hathaway ◽  
K. L. Kloster ◽  
B. H. Layne

Frequently in biomedical literature, measurements are considered “not statistically different” if a statistical test fails to achieve a P value that is < or = 0.05. This conclusion may be misleading because the size of each group is too small or the variability is large, and a type II error (false negative) is committed. In this study, we examined the probabilities of detecting a real difference (power) and type II errors in unpaired t-tests in Volumes 246 and 266 of the American Journal of Physiology: Heart and Circulatory Physiology. In addition, we examined all articles for other statistical errors. The median power of the t-tests was similar in these volumes (approximately 0.55 and approximately 0.92 to detect a 20% and a 50% change, respectively). In both volumes, approximately 80% of the studies with nonsignificant unpaired t-tests contained at least one t-test with a type II error probability > 0.30. Our findings suggest that low power and a high incidence of type II errors are common problems in this journal. In addition, the presentation of statistics was often vague, t-tests were misused frequently, and assumptions for inferential statistics usually were not mentioned or examined.


Sign in / Sign up

Export Citation Format

Share Document