An Improved Rank Correlation Effect Size Statistic for Single-Case Designs: Baseline Corrected Tau

2016 ◽  
Vol 41 (4) ◽  
pp. 427-467 ◽  
Author(s):  
Kevin R. Tarlow

Measuring treatment effects when an individual’s pretreatment performance is improving poses a challenge for single-case experimental designs. It may be difficult to determine whether improvement is due to the treatment or due to the preexisting baseline trend. Tau- U is a popular single-case effect size statistic that purports to control for baseline trend. However, despite its strengths, Tau- U has substantial limitations: Its values are inflated and not bound between −1 and +1, it cannot be visually graphed, and its relatively weak method of trend control leads to unacceptable levels of Type I error wherein ineffective treatments appear effective. An improved effect size statistic based on rank correlation and robust regression, Baseline Corrected Tau, is proposed and field-tested with both published and simulated single-case time series. A web-based calculator for Baseline Corrected Tau is also introduced for use by single-case investigators.

2019 ◽  
pp. 014544551986021 ◽  
Author(s):  
Antonia R. Giannakakos ◽  
Marc J. Lanovaz

Single-case experimental designs often require extended baselines or the withdrawal of treatment, which may not be feasible or ethical in some practical settings. The quasi-experimental AB design is a potential alternative, but more research is needed on its validity. The purpose of our study was to examine the validity of using nonoverlap measures of effect size to detect changes in AB designs using simulated data. In our analyses, we determined thresholds for three effect size measures beyond which the type I error rate would remain below 0.05 and then examined whether using these thresholds would provide sufficient power. Overall, our analyses show that some effect size measures may provide adequate control over type I error rate and sufficient power when analyzing data from AB designs. In sum, our results suggest that practitioners may use quasi-experimental AB designs in combination with effect size to rigorously assess progress in practice.


2021 ◽  
Author(s):  
Marc J Lanovaz ◽  
Rachel Primiani

Researchers and practitioners often use single-case designs (SCDs), or n-of-1 trials, to develop and validate novel treatments. Standards and guidelines have been published to provide guidance as to how to implement SCDs, but many of their recommendations are not derived from the research literature. For example, one of these recommendations suggests that researchers and practitioners should wait for baseline stability prior to introducing an independent variable. However, this recommendation is not strongly supported by empirical evidence. To address this issue, we used a Monte Carlo simulation to generate a total of 480,000 AB graphs with fixed, response-guided, and random baseline lengths. Then, our analyses compared the Type I error rate and power produced by two methods of analysis: the conservative dual-criteria method (a structured visual aid) and a support vector classifier (a model derived from machine learning). The conservative dual-criteria method produced more power when using response-guided decision-making (i.e., waiting for stability) with negligeable effects on Type I error rate. In contrast, waiting for stability did not reduce decision-making errors with the support vector classifier. Our findings question the necessity of waiting for baseline stability when using SCDs with machine learning, but the study must be replicated with other designs to support our results.


2021 ◽  
Author(s):  
Marc J Lanovaz ◽  
Kieva Hranchuk

Behavior analysts commonly use visual inspection to analyze single-case graphs, but studies on its reliability have produced mixed results. To examine this issue, we compared the Type I error rate and power of visual inspection with a novel approach, machine learning. Five expert visual raters analyzed 1,024 simulated AB graphs, which differed on number of points per phase, autocorrelation, trend, variability, and effect size. The ratings were compared to those obtained by the conservative dual-criteria method and two models derived from machine learning. On average, visual raters agreed with each other on only 73% of graphs. In contrast, both models derived from machine learning showed the best balance between Type I error rate and power while producing more consistent results across different graph characteristics. The results suggest that machine learning may support researchers and practitioners in making less error when analyzing single-case graphs, but further replications remain necessary.


2017 ◽  
Vol 43 (1) ◽  
pp. 115-131 ◽  
Author(s):  
Marc J. Lanovaz ◽  
Patrick Cardinal ◽  
Mary Francis

Although visual inspection remains common in the analysis of single-case designs, the lack of agreement between raters is an issue that may seriously compromise its validity. Thus, the purpose of our study was to develop and examine the properties of a simple structured criterion to supplement the visual analysis of alternating-treatment designs. To this end, we generated simulated data sets with varying number of points, number of conditions, effect sizes, and autocorrelations, and then measured Type I error rates and power produced by the visual structured criterion (VSC) and permutation analyses. We also validated the results for Type I error rates using nonsimulated data. Overall, our results indicate that using the VSC as a supplement for the analysis of systematically alternating-treatment designs with at least five points per condition generally provides adequate control over Type I error rates and sufficient power to detect most behavior changes.


2021 ◽  
Author(s):  
Marc J Lanovaz

Despite being a cornerstone of the science of behavior analysis, researchers and practitioners often rely on tradition and consensus-based guidelines, rather than empirical evidence, to make decisions about single-case designs. One approach to develop empirically-based guidelines is to use Monte Carlo simulations for validation, but behavior analysts are not necessarily trained to apply this type of methodology. Therefore, the purpose of our technical article is to walk the reader through conducting Monte Carlo simulations to examine the accuracy, Type I error rate, and power of a visual aid for AB graphs using R Code. Additionally, the tutorial provides code to replicate the procedures with single-case experimental designs as well as with the Python programming language. Overall, a broader adoption of Monte Carlo simulations to validate guidelines should lead to an improvement in how researchers and practitioners use single-case designs.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


2021 ◽  
Author(s):  
Megha Joshi ◽  
James E Pustejovsky ◽  
S. Natasha Beretvas

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


Sign in / Sign up

Export Citation Format

Share Document