Comparing two circular distributions: advice for effective implementation of statistical procedures in biology

Many biological variables, often involving timings of events or directions, are recorded on a circular rather than linear scale, and need different statistical treatment for that reason. A common question that is asked of such circular data involves comparison between two groups or treatments: Are the populations from which the two samples drawn differently distributed around the circle? For example, we might ask whether the distribution of directions from which a stalking predator approaches its prey differs between sunny and cloudy conditions; or whether the time of day of mating attempts differs between lab mice subject to one of two hormone treatments. An array of statistical approaches to these questions have been developed. We compared 18 of these (by simulation) in terms of both abilities to control type I error rate near the nominal value, and statistical power. We found that only eight tests offered good control of type I error in all our test situations. Of these eight, we are able to identify Watsons U^2 test and MANOVA based on trigonometric functions of the data as offering the best power in the overwhelming majority of our test circumstances. There was often little to choose between these tests in terms of power, and no situation where either of the remaining six tests offered substantially better power than either of these. Hence, we recommend the routine use of either Watsons U^2 test or MANOVA when comparing two samples of circular data.

Download Full-text

Advice on comparing two independent samples of circular data in biology

Scientific Reports ◽

10.1038/s41598-021-99299-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Lukas Landler ◽

Graeme D. Ruxton ◽

E. Pascal Malkemper

Keyword(s):

Error Rate ◽

Statistical Power ◽

Type I Error ◽

Statistical Treatment ◽

Good Control ◽

Circular Data ◽

Type I ◽

Trigonometric Functions ◽

Biological Variables ◽

Two Samples

AbstractMany biological variables are recorded on a circular scale and therefore need different statistical treatment. A common question that is asked of such circular data involves comparison between two groups: Are the populations from which the two samples are drawn differently distributed around the circle? We compared 18 tests for such situations (by simulation) in terms of both abilities to control Type-I error rate near the nominal value, and statistical power. We found that only eight tests offered good control of Type-I error in all our simulated situations. Of these eight, we were able to identify the Watson’s U2 test and a MANOVA approach, based on trigonometric functions of the data, as offering the best power in the overwhelming majority of our test circumstances. There was often little to choose between these tests in terms of power, and no situation where either of the remaining six tests offered substantially better power than either of these. Hence, we recommend the routine use of either Watson’s U2 test or MANOVA approach when comparing two samples of circular data.

Download Full-text

Model selection versus traditional hypothesis testing in circular statistics: a simulation study

Biology Open ◽

10.1242/bio.049866 ◽

2020 ◽

Vol 9 (6) ◽

pp. bio049866

Author(s):

Lukas Landler ◽

Graeme D. Ruxton ◽

E. Pascal Malkemper

Keyword(s):

Type I Error ◽

Statistical Treatment ◽

Model Fitting ◽

Standard Technique ◽

Circular Data ◽

Circular Statistics ◽

Distribution Model ◽

Type I ◽

Rayleigh Test ◽

Made In

ABSTRACTMany studies in biology involve data measured on a circular scale. Such data require different statistical treatment from those measured on linear scales. The most common statistical exploration of circular data involves testing the null hypothesis that the data show no aggregation and are instead uniformly distributed over the whole circle. The most common means of performing this type of investigation is with a Rayleigh test. An alternative might be to compare the fit of the uniform distribution model to alternative models. Such model-fitting approaches have become a standard technique with linear data, and their greater application to circular data has been recently advocated. Here we present simulation data that demonstrate that such model-based inference can offer very similar performance to the best traditional tests, but only if adjustment is made in order to control type I error rate.

Download Full-text

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Meta Analyses

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Download Full-text

A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles

10.31234/osf.io/3bdfu ◽

2019 ◽

Cited By ~ 2

Author(s):

Rob Cribbie ◽

Nataly Beribisky ◽

Udi Alter

Keyword(s):

Sample Size ◽

Effect Size ◽

Power Analysis ◽

Statistical Power ◽

Type I Error ◽

A Priori ◽

Type I ◽

Specific Level ◽

Maximum Sample Size ◽

Power Analyses

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.

Download Full-text

Cognitive tests used in chronic adult human randomised controlled trial micronutrient and phytochemical intervention studies

Nutrition Research Reviews ◽

10.1017/s0954422410000119 ◽

2010 ◽

Vol 23 (2) ◽

pp. 200-229 ◽

Cited By ~ 25

Author(s):

Anna L. Macready ◽

Laurie T. Butler ◽

Orla B. Kennedy ◽

Judi A. Ellis ◽

Claire M. Williams ◽

...

Keyword(s):

Randomised Controlled Trial ◽

Statistical Power ◽

Type I Error ◽

Spatial Working Memory ◽

Controlled Trial ◽

Type I ◽

Cognitive Tests ◽

Cognitive Domains ◽

Positive Effects ◽

Randomised Controlled

In recent years there has been a rapid growth of interest in exploring the relationship between nutritional therapies and the maintenance of cognitive function in adulthood. Emerging evidence reveals an increasingly complex picture with respect to the benefits of various food constituents on learning, memory and psychomotor function in adults. However, to date, there has been little consensus in human studies on the range of cognitive domains to be tested or the particular tests to be employed. To illustrate the potential difficulties that this poses, we conducted a systematic review of existing human adult randomised controlled trial (RCT) studies that have investigated the effects of 24 d to 36 months of supplementation with flavonoids and micronutrients on cognitive performance. There were thirty-nine studies employing a total of 121 different cognitive tasks that met the criteria for inclusion. Results showed that less than half of these studies reported positive effects of treatment, with some important cognitive domains either under-represented or not explored at all. Although there was some evidence of sensitivity to nutritional supplementation in a number of domains (for example, executive function, spatial working memory), interpretation is currently difficult given the prevailing ‘scattergun approach’ for selecting cognitive tests. Specifically, the practice means that it is often difficult to distinguish between a boundary condition for a particular nutrient and a lack of task sensitivity. We argue that for significant future progress to be made, researchers need to pay much closer attention to existing human RCT and animal data, as well as to more basic issues surrounding task sensitivity, statistical power and type I error.

Download Full-text

Required sample size for comparing two independent means

Marine Medicine ◽

10.22328/2413-5747-2020-6-2-106-113 ◽

2020 ◽

Vol 6 (2) ◽

pp. 106-113

Author(s):

A. M. Grjibovski ◽

M. A. Gorbatova ◽

A. N. Narkevich ◽

K. A. Vinogradov

Keyword(s):

Sample Size ◽

Statistical Power ◽

Type I Error ◽

Sample Size Calculation ◽

Biomedical Literature ◽

Type I ◽

Research Practice ◽

False Null Hypothesis ◽

Different Levels ◽

Russian Research

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.

Download Full-text

Bayesian Two-Stage Adaptive Design in Bioequivalence

The International Journal of Biostatistics ◽

10.1515/ijb-2018-0105 ◽

2019 ◽

Vol 16 (1) ◽

Cited By ~ 2

Author(s):

Shengjie Liu ◽

Jun Gao ◽

Yuling Zheng ◽

Lei Huang ◽

Fangrong Yan

Keyword(s):

Statistical Power ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Type I ◽

Two Stage ◽

Stage Design ◽

Estimation Strategy ◽

Drug Products ◽

Two Stage Design

AbstractBioequivalence (BE) studies are an integral component of new drug development process, and play an important role in approval and marketing of generic drug products. However, existing design and evaluation methods are basically under the framework of frequentist theory, while few implements Bayesian ideas. Based on the bioequivalence predictive probability model and sample re-estimation strategy, we propose a new Bayesian two-stage adaptive design and explore its application in bioequivalence testing. The new design differs from existing two-stage design (such as Potvin’s method B, C) in the following aspects. First, it not only incorporates historical information and expert information, but further combines experimental data flexibly to aid decision-making. Secondly, its sample re-estimation strategy is based on the ratio of the information in interim analysis to total information, which is simpler in calculation than the Potvin’s method. Simulation results manifested that the two-stage design can be combined with various stop boundary functions, and the results are different. Moreover, the proposed method saves sample size compared to the Potvin’s method under the conditions that type I error rate is below 0.05 and statistical power reaches 80 %.

Download Full-text

Optimal selection of genetic variants for adjustment of population stratification in European association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbz023 ◽

2019 ◽

Vol 21 (3) ◽

pp. 753-761 ◽

Cited By ~ 2

Author(s):

Regina Brinster ◽

Dominique Scherer ◽

Justo Lorenzo Bermejo

Keyword(s):

Genetic Variants ◽

Population Stratification ◽

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Reference Sample ◽

Error Rates ◽

The Cancer Genome Atlas ◽

Type I ◽

Genotype Data

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.

Download Full-text

The asymptotic distribution of modularity in weighted signed networks

Biometrika ◽

10.1093/biomet/asaa059 ◽

2020 ◽

Author(s):

Rong Ma ◽

Ian Barnett

Keyword(s):

Community Structure ◽

Asymptotic Distribution ◽

Statistical Power ◽

Type I Error ◽

Real Data ◽

Asymptotic Distributions ◽

Edge Weight ◽

Type I ◽

Largest Eigenvalue ◽

The Largest Eigenvalue

Summary Modularity is a popular metric for quantifying the degree of community structure within a network. The distribution of the largest eigenvalue of a network’s edge weight or adjacency matrix is well studied and is frequently used as a substitute for modularity when performing statistical inference. However, we show that the largest eigenvalue and modularity are asymptotically uncorrelated, which suggests the need for inference directly on modularity itself when the network is large. To this end, we derive the asymptotic distribution of modularity in the case where the network’s edge weight matrix belongs to the Gaussian orthogonal ensemble, and study the statistical power of the corresponding test for community structure under some alternative models. We empirically explore universality extensions of the limiting distribution and demonstrate the accuracy of these asymptotic distributions through Type I error simulations. We also compare the empirical powers of the modularity-based tests and some existing methods. Our method is then used to test for the presence of community structure in two real data applications.

Download Full-text

Calibrate your confidence in research findings: A tutorial on improving research methods and practices

Journal of Pacific Rim Psychology ◽

10.1017/prp.2020.7 ◽

2020 ◽

Vol 14 ◽

Cited By ~ 5

Author(s):

Aline da Silva Frost ◽

Alison Ledgerwood

Keyword(s):

Research Methods ◽

Statistical Power ◽

Type I Error ◽

Effect Sizes ◽

Online Media ◽

Type I ◽

Journal Articles ◽

Psychological Science ◽

Different Types ◽

Research Findings

Abstract This article provides an accessible tutorial with concrete guidance for how to start improving research methods and practices in your lab. Following recent calls to improve research methods and practices within and beyond the borders of psychological science, resources have proliferated across book chapters, journal articles, and online media. Many researchers are interested in learning more about cutting-edge methods and practices but are unsure where to begin. In this tutorial, we describe specific tools that help researchers calibrate their confidence in a given set of findings. In Part I, we describe strategies for assessing the likely statistical power of a study, including when and how to conduct different types of power calculations, how to estimate effect sizes, and how to think about power for detecting interactions. In Part II, we provide strategies for assessing the likely type I error rate of a study, including distinguishing clearly between data-independent (“confirmatory”) and data-dependent (“exploratory”) analyses and thinking carefully about different forms and functions of preregistration.

Download Full-text