An Analysis of Statistical Power in Behavioral Accounting Research

2001 ◽  
Vol 13 (1) ◽  
pp. 63-84 ◽  
Author(s):  
Susan C. Borkowski ◽  
Mary Jeanne Welsh ◽  
Qinke (Michael) Zhang

Attention to statistical power and effect size can improve the design and the reporting of behavioral accounting research. Three accounting journals representative of current empirical behavioral accounting research are analyzed for their power (1−β), or control of Type II errors (β), and compared to research in other disciplines. Given this study's findings, additional attention should be directed to adequacy of sample sizes and study design to ensure sufficient power when Type I error is controlled at α = .05 as a baseline. We do not suggest replacing traditional significance testing, but rather augmenting it with the reporting of β to complement and interpret the relevance of a reported α in any given study. In addition, the presentation of results in alternative formats, such as those suggested in this study, will enhance the current reporting of significance tests. In turn, this will allow the reader a richer understanding of, and an increased trust in, a study's results and implications.

1996 ◽  
Vol 1 (1) ◽  
pp. 25-28 ◽  
Author(s):  
Martin A. Weinstock

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.


1993 ◽  
Vol 76 (2) ◽  
pp. 407-412 ◽  
Author(s):  
Donald W. Zimmerman

This study investigated violations of random sampling and random assignment in data analyzed by nonparametric significance tests. A computer program induced correlations within groups, as well as between groups, and performed one-sample and two-sample versions of the Mann-Whitney-Wilcoxon test on the resulting scores. Nonindependence of observations within groups spuriously inflated the probability of Type I errors and depressed the probability of Type II errors, and nonindependence between groups had the reverse effect. This outcome, which parallels the influence of nonindependence on parametric tests, can be explained by the equivalence of the Mann-Whitney-Wilcoxon test and the Student t test performed on ranks replacing the initial scores.


2013 ◽  
Vol 27 (4) ◽  
pp. 693-710 ◽  
Author(s):  
Adrian Valencia ◽  
Thomas J. Smith ◽  
James Ang

SYNOPSIS Fair value accounting has been a hotly debated topic during the recent financial crisis. Supporters argue that fair values are more relevant to investors, while detractors point to the measurement error in the estimation of the reported fair values to attack its reliability. This study examines how noise in reported fair values impacts bank capital adequacy ratios. If measurement error causes reported capital levels to deviate from fundamental levels, then regulators could misidentify a financially healthy bank as troubled (type I error) or a financially troubled bank as safe (type II error), leading to suboptimal resource allocations for banks, regulators, and investors. We use a Monte Carlo simulation to generate our data, and find that while noise leads to both type I and type II errors around key Federal Deposit Insurance Corporation (FDIC) capital adequacy benchmarks, the type I error dominates. Specifically, noise is associated with 2.58 (2.60) [1.092], 5.67 (6.44) [1.94], and 10.60 (26.83) [3.423] times more type I errors than type II errors around the Tier 1 (Total) [Leverage] well-capitalized, adequately capitalized, and significantly undercapitalized benchmarks, respectively. Economically, our results suggest that noise can lead to inefficient allocation of resources on the part of regulators (increased monitoring costs) and banks (increased compliance costs). JEL Classifications: D52; M41; C15; G21.


2018 ◽  
Vol 108 (1) ◽  
pp. 15-22 ◽  
Author(s):  
David H. Gent ◽  
Paul D. Esker ◽  
Alissa B. Kriss

In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 – (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even indicate some studies should not be undertaken. However, the conclusions of adequately powered studies are less prone to erroneous conclusions and inflated estimates of treatment effectiveness, especially when effect sizes are small.


Author(s):  
Narayan Prasad Nagendra ◽  
Gopalakrishnan Narayanamurthy ◽  
Roger Moser

Abstract Farmers submit claims to insurance providers when affected by sowing/planting risk, standing crop risk, post-harvest risk, and localized calamities risk. Decision making for settlement of claims submitted by farmers has been observed to comprise of type-I and type-II errors. The existence of these errors reduces confidence on agri-insurance providers and government in general as it fails to serve the needy farmers (type-I error) and sometimes serve the ineligible farmers (type-II error). The gaps in currently used underlying data, methods and timelines including anomalies in locational data used in crop sampling, inclusion of invalid data points in computation, estimation of crop yield, and determination of the total sown area create barriers in executing the indemnity payments for small and marginal farmers in India. In this paper, we present a satellite big data analytics based case study in a region in India and explain how the anomalies in the legacy processes were addressed to minimize type-I and type-II errors and thereby make ethical decisions while approving farmer claims. Our study demonstrates what big data analytics can offer to increase the ethicality of the decisions and the confidence at which the decision is made, especially when the beneficiaries of the decision are poor and powerless.


2020 ◽  
Vol 16 (11) ◽  
pp. e1008286
Author(s):  
Howard Bowman ◽  
Joseph L. Brooks ◽  
Omid Hajilou ◽  
Alexia Zoumpoulaki ◽  
Vladimir Litvak

There has been considerable debate and concern as to whether there is a replication crisis in the scientific literature. A likely cause of poor replication is the multiple comparisons problem. An important way in which this problem can manifest in the M/EEG context is through post hoc tailoring of analysis windows (a.k.a. regions-of-interest, ROIs) to landmarks in the collected data. Post hoc tailoring of ROIs is used because it allows researchers to adapt to inter-experiment variability and discover novel differences that fall outside of windows defined by prior precedent, thereby reducing Type II errors. However, this approach can dramatically inflate Type I error rates. One way to avoid this problem is to tailor windows according to a contrast that is orthogonal (strictly parametrically orthogonal) to the contrast being tested. A key approach of this kind is to identify windows on a fully flattened average. On the basis of simulations, this approach has been argued to be safe for post hoc tailoring of analysis windows under many conditions. Here, we present further simulations and mathematical proofs to show exactly why the Fully Flattened Average approach is unbiased, providing a formal grounding to the approach, clarifying the limits of its applicability and resolving published misconceptions about the method. We also provide a statistical power analysis, which shows that, in specific contexts, the fully flattened average approach provides higher statistical power than Fieldtrip cluster inference. This suggests that the Fully Flattened Average approach will enable researchers to identify more effects from their data without incurring an inflation of the false positive rate.


2015 ◽  
Vol 15 (2) ◽  
Author(s):  
Kong-Pin Chen ◽  
Tsung-Sheng Tsai

AbstractJudicial torture to extract information or to elicit a confession was a common practice in pre-modern societies, both in the east and the west. This paper proposes a positive theory for judicial torture. It is shown that torture reflects the magistrate’s attempt to balance type I and type II errors in the decision-making, by forcing the guilty to confess with a higher probability than the innocent, and thereby decreases the type I error at the cost of the type II error. Moreover, there is a non-monotonic relationship between the superiority of torture and the informativeness of investigation: when investigation is relatively uninformative, an improvement in technology used in the investigation actually lends an advantage to torture so that torture is even more attractive to the magistrates; however, when technological progress reaches a certain threshold, the advantage of torture is weakened, so that a judicial system based on torture becomes inferior to one based on evidence. This result can explain the historical development of the judicial system.


1991 ◽  
Vol 42 (5) ◽  
pp. 555 ◽  
Author(s):  
PG Fairweather

This paper discusses, from a philosophical perspective, the reasons for considering the power of any statistical test used in environmental biomonitoring. Power is inversely related to the probability of making a Type II error (i.e. low power indicates a high probability of Type II error). In the context of environmental monitoring, a Type II error is made when it is concluded that no environmental impact has occurred even though one has. Type II errors have been ignored relative to Type I errors (the mistake of concluding that there is an impact when one has not occurred), the rates of which are stipulated by the a values of the test. In contrast, power depends on the value of α, the sample size used in the test, the effect size to be detected, and the variability inherent in the data. Although power ideas have been known for years, only recently have these issues attracted the attention of ecologists and have methods been available for calculating power easily. Understanding statistical power gives three ways to improve environmental monitoring and to inform decisions about actions arising from monitoring. First, it allows the most sensitive tests to be chosen from among those applicable to the data. Second, preliminary power analysis can be used to indicate the sample sizes necessary to detect an environmental change. Third, power analysis should be used after any nonsignificant result is obtained in order to judge whether that result can be interpreted with confidence or the test was too weak to examine the null hypothesis properly. Power procedures are concerned with the statistical significance of tests of the null hypothesis, and they lend little insight, on their own, into the workings of nature. Power analyses are, however, essential to designing sensitive tests and correctly interpreting their results. The biological or environmental significance of any result, including whether the impact is beneficial or harmful, is a separate issue. The most compelling reason for considering power is that Type II errors can be more costly than Type I errors for environmental management. This is because the commitment of time, energy and people to fighting a false alarm (a Type I error) may continue only in the short term until the mistake is discovered. In contrast, the cost of not doing something when in fact it should be done (a Type II error) will have both short- and long-term costs (e.g. ensuing environmental degradation and the eventual cost of its rectification). Low power can be disastrous for environmental monitoring programmes.


Sign in / Sign up

Export Citation Format

Share Document