scholarly journals Analyzing Nested Experimental Designs - A User-Friendly Resampling Method to Determine Experimental Significance

2021 ◽  
Author(s):  
Rishikesh U Kulkarni ◽  
Catherine L Wang ◽  
Carolyn R Bertozzi

While hierarchical experimental designs are near-ubiquitous in neuroscience and biomedical research, researchers often do not take the structure of their datasets into account while performing statistical hypothesis tests. Resampling-based methods are a flexible strategy for performing these analyses but are difficult due to the lack of open-source software to automate test construction and execution. To address this, we report Hierarch, a Python package to perform hypothesis tests and compute confidence intervals on hierarchical experimental designs. Using a combination of permutation resampling and bootstrap aggregation, Hierarch can be used to perform hypothesis tests that maintain nominal Type I error rates and generate confidence intervals that maintain the nominal coverage probability without making distributional assumptions about the dataset of interest. Hierarch makes use of the Numba JIT compiler to reduce p-value computation times to under one second for typical datasets in biomedical research. Hierarch also enables researchers to construct user-defined resampling plans that take advantage of Hierarch's Numba-accelerated functions. Hierarch is freely available as a Python package at https://github.com/rishi-kulkarni/hierarch.

2018 ◽  
Author(s):  
Daniel Lakens ◽  
Marie Delacre

To move beyond the limitations of null-hypothesis tests, statistical approaches have been developed where the observed data is compared against a range of values that are equivalent to the absence of a meaningful effect. Specifying a range of values around zero allows researchers to statistically reject the presence of effects large enough to matter, and prevents practically insignificant effects from being interpreted as a statistically significant difference. We compare the behavior of the recently proposed second generation p-value (Blume, D’Agostino McGowan, Dupont, & Greevy, 2018) with the more established Two One-Sided Tests (TOST) equivalence testing procedure (Schuirmann, 1987). We show that the two approaches yield almost identical results under optimal conditions. Under suboptimal conditions (e.g., when the confidence interval is wider than the equivalence range, or when confidence intervals are asymmetric) the second generation p-value becomes difficult to interpret as a descriptive statistic. The second generation p-value is interpretable in a dichotomous manner (i.e., when the SGPV equals 0 or 1 because the confidence intervals lies completely within or outside of the equivalence range), but this dichotomous interpretation does not require calculations. We conclude that equivalence tests yield more consistent p-values, distinguish between datasets that yield the same second generation p-value, and allow for easier control of Type I and Type II error rates.


2020 ◽  
Vol 4 ◽  
Author(s):  
Daniël Lakens ◽  
Marie Delacre

To move beyond the limitations of null-hypothesis tests, statistical approaches have been developed where the observed data are compared against a range of values that are equivalent to the absence of a meaningful effect. Specifying a range of values around zero allows researchers to statistically reject the presence of effects large enough to matter, and prevents practically insignificant effects from being interpreted as a statistically significant difference. We compare the behavior of the recently proposed second generation p-value (Blume, D’Agostino McGowan, Dupont, & Greevy, 2018) with the more established Two One-Sided Tests (TOST) equivalence testing procedure (Schuirmann, 1987). We show that the two approaches yield almost identical results under optimal conditions. Under suboptimal conditions (e.g., when the confidence interval is wider than the equivalence range, or when confidence intervals are asymmetric) the second generation p-value becomes difficult to interpret. The second generation p-value is interpretable in a dichotomous manner (i.e., when the SGPV equals 0 or 1 because the confidence intervals lies completely within or outside of the equivalence range), but this dichotomous interpretation does not require calculations. We conclude that equivalence tests yield more consistent p-values, distinguish between datasets that yield the same second generation p-value, and allow for easier control of Type I and Type II error rates.


2012 ◽  
Vol 17 (2) ◽  
pp. 203
Author(s):  
Pedro Monterrey Gutiérrez Monterrey Gutiérrez

Hypothesis testing is a well-known procedure for data analysis widely used in scientific papers but, at the same time, strongly criticized and its use questioned and restricted in some cases due to inconsistencies observed from their application. This issue is analyzed in this paper on the basis of the fundamentals of the statistical methodology and the different approaches that have been historically developed to solve the problem of statistical hypothesis analysis highlighting a not well known point: the P value is a random variable. The fundamentals of Fisher´s, Neyman-Pearson´s and Bayesian´s solutions are analyzed and based on them, the inconsistency of the commonly used procedure of determining a p value, compare it to a type I error value (usually 0.05) and get a conclusion is discussed and, on their basis, inconsistencies of the data analysis procedure are identified, procedure consisting in the identification of a P value, the comparison of the P-value with a type-I error value –which<br />is usually considered to be 0.05– and upon this the decision on the conclusions of the analysis. Additionally, recommendations on the<br />best way to proceed when solving a problem are presented, as well as the methodological and teaching challenges to be faced when analyzing correctly the data and determining the validity of the hypotheses.<br /><strong>Key words</strong>: Neyman-Pearson’s hypothesis tests, Fisher’s significance tests, Bayesian hypothesis tests, Vancouver norms, P-value, null-hypothesis.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Moritz Mercker ◽  
Philipp Schwemmer ◽  
Verena Peschko ◽  
Leonie Enners ◽  
Stefan Garthe

Abstract Background New wildlife telemetry and tracking technologies have become available in the last decade, leading to a large increase in the volume and resolution of animal tracking data. These technical developments have been accompanied by various statistical tools aimed at analysing the data obtained by these methods. Methods We used simulated habitat and tracking data to compare some of the different statistical methods frequently used to infer local resource selection and large-scale attraction/avoidance from tracking data. Notably, we compared spatial logistic regression models (SLRMs), spatio-temporal point process models (ST-PPMs), step selection models (SSMs), and integrated step selection models (iSSMs) and their interplay with habitat and animal movement properties in terms of statistical hypothesis testing. Results We demonstrated that only iSSMs and ST-PPMs showed nominal type I error rates in all studied cases, whereas SSMs may slightly and SLRMs may frequently and strongly exceed these levels. iSSMs appeared to have on average a more robust and higher statistical power than ST-PPMs. Conclusions Based on our results, we recommend the use of iSSMs to infer habitat selection or large-scale attraction/avoidance from animal tracking data. Further advantages over other approaches include short computation times, predictive capacity, and the possibility of deriving mechanistic movement models.


2021 ◽  
Author(s):  
Megha Joshi ◽  
James E Pustejovsky ◽  
S. Natasha Beretvas

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.


2019 ◽  
Author(s):  
Xiaokang Lyu ◽  
Yuepei Xu ◽  
Xiaofan Zhao ◽  
Xi-Nian Zuo ◽  
Hu Chuan-Peng

P-value and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys revealed that these two indices are generally misunderstood. However, existing surveys on this subject fall under psychology and biomedical research, and data from other disciplines are rare. Moreover, the confidence of researchers when constructing judgments remains unclear. To fill this research gap, we survey 1,479 researchers and students from different fields in China. Results reveal that for significant (p &lt; .05, CI doesn’t include 0) and non-significant (p &gt; .05, CI includes 0) conditions, most respondents, regardless of academic degrees, research fields, and stages of career, could not interpret p-value and CI accurately. Moreover, the majority of them are confident about their (inaccurate) judgments (see osf.io/mcu9q/ for raw data, materials, and supplementary analyses). Therefore, misinterpretations of p-value and CIs prevail in the whole scientific community, thus the need for statistical training in science.


Author(s):  
Richard McCleary ◽  
David McDowall ◽  
Bradley J. Bartos

Chapter 6 addresses the sub-category of internal validity defined by Shadish et al., as statistical conclusion validity, or “validity of inferences about the correlation (covariance) between treatment and outcome.” The common threats to statistical conclusion validity can arise, or become plausible through either model misspecification or through hypothesis testing. The risk of a serious model misspecification is inversely proportional to the length of the time series, for example, and so is the risk of mistating the Type I and Type II error rates. Threats to statistical conclusion validity arise from the classical and modern hybrid significance testing structures, the serious threats that weigh heavily in p-value tests are shown to be undefined in Beyesian tests. While the particularly vexing threats raised by modern null hypothesis testing are resolved through the elimination of the modern null hypothesis test, threats to statistical conclusion validity would inevitably persist and new threats would arise.


2015 ◽  
Vol 9 (12) ◽  
pp. 1
Author(s):  
Tobi Kingsley Ochuko ◽  
Suhaida Abdullah ◽  
Zakiyah Binti Zain ◽  
Sharipah Syed Soaad Yahaya

This study examines the use of independent group test of comparing two or more means by using parametric method, such as the Alexander-Govern (<em>AG</em>) test. The Alexander-Govern test is used for comparing two or more groups and is a better alternative compared to the James test, the Welch test and the <em>ANOVA</em>. This test has a good control of Type I error rates and gives a high power under variance heterogeneity for a normal data, but it is not robust for non-normal data. As a result, trimmed mean was applied on the test under non-normal data for two group condition. But this test could not control the Type I error rates, when the number of groups exceed two groups. As a result, the <em>MOM</em> estimator was introduced on the test, as its central tendency measure and is not influenced by the number of groups. But this estimator fails to give a good control of Type I error rates, under skewed heavy tailed distribution. In this study, the <em>AGWMOM </em>test was applied in Alexander-Govern test as its central tendency measure. To evaluate the capacity of the test, a real life data was used. Descriptive statistics, Tests of Normality and boxplots were used to determine the normality and non-normality of the independent groups. The results show that only the group middle is not normally distributed due extreme value in the data distribution. The results from the test statistic show that the <em>AGWMOM</em> test has a smaller p-value of 0.0000002869 that is less than 0.05, compared to the <em>AG</em> test that produced a p-value of 0.06982, that is greater than 0.05. Therefore, the <em>AGWMOM</em> test is considered to be significant, compared to the <em>AG</em> test.


Sign in / Sign up

Export Citation Format

Share Document