scholarly journals Rethinking Robust Statistics with Modern Bayesian Methods

2017 ◽  
Author(s):  
Donald Ray Williams ◽  
Stephen Ross Martin

Developing robust statistical methods is an important goal for psychological science. Whereas classical methods (i.e., sampling distributions, p-values, etc.) have been thoroughly characterized, Bayesian robust methods remain relatively uncommon in practice and methodological literatures. Here we propose a robust Bayesian model (BHS t ) that accommodates heterogeneous (H) variances by predicting the scale parameter on the log scale and tail-heaviness with a Student-t likelihood (S t). Through simulations with normative and contaminated (i.e., heavy-tailed) data, we demonstrate that BHS t has consistent frequentist properties in terms of type I error, power, and mean squared error compared to three classical robust methods. With a motivating example, we illustrate Bayesian inferential methods such as approximate leave-one-out cross-validation and posterior predictive checks. We end by suggesting areas of improvement for BHS t and discussing Bayesian robust methods in practice.

2013 ◽  
Vol 52 (04) ◽  
pp. 351-359 ◽  
Author(s):  
M. O. Scheinhardt ◽  
A. Ziegler

Summary Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions. Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.


1987 ◽  
Vol 12 (1) ◽  
pp. 45-61 ◽  
Author(s):  
Stephen F. Olejnik ◽  
James Algina

Estimated Type I error rates and power are reported for the Brown-Forsythe, O’Brien, Klotz, and Siegel-Tukey procedures. The effect of aligning the data, by using deviations from group means or group medians, is investigated for the latter two tests. Normal and non-normal distributions, equal and unequal sample-size combinations, and equal and unequal means are investigated for a two-group design. No test is robust and most powerful for all distributions, however, using O’Brien’s procedure will avoid the possibility of a liberal test and provide power almost as large as what would be provided by choosing the most powerful test for each distribution type. Using the Brown-Forsythe procedure with heavy-tailed distributions and O’Brien’s procedure for other distributions will increase power modestly and maintain robustness. Using the mean-aligned Klotz test or the unaligned Klotz test with appropriate distributions can increase power, but only at the risk of increased Type I error rates if the tests are not accurately matched to the distribution type.


2017 ◽  
Author(s):  
Rand R. Wilcox ◽  
Guillaume A. Rousselet

ABSTRACTThere is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of neuroscience data. These new techniques effectively deal with four insights into when and why conventional methods can be unsatisfactory. But for the non-statistician, the vast array of new and improved techniques for comparing groups and studying associations can seem daunting, simply because there are so many new methods that are now available. The paper briefly reviews when and why conventional methods can have relatively low power and yield misleading results. The main goal is to suggest some general guidelines regarding when, how and why certain modern techniques might be used.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Beth Ann Griffin ◽  
Megan S. Schuler ◽  
Elizabeth A. Stuart ◽  
Stephen Patrick ◽  
Elizabeth McNeer ◽  
...  

Abstract Background Reliable evaluations of state-level policies are essential for identifying effective policies and informing policymakers’ decisions. State-level policy evaluations commonly use a difference-in-differences (DID) study design; yet within this framework, statistical model specification varies notably across studies. More guidance is needed about which set of statistical models perform best when estimating how state-level policies affect outcomes. Methods Motivated by applied state-level opioid policy evaluations, we implemented an extensive simulation study to compare the statistical performance of multiple variations of the two-way fixed effect models traditionally used for DID under a range of simulation conditions. We also explored the performance of autoregressive (AR) and GEE models. We simulated policy effects on annual state-level opioid mortality rates and assessed statistical performance using various metrics, including directional bias, magnitude bias, and root mean squared error. We also reported Type I error rates and the rate of correctly rejecting the null hypothesis (e.g., power), given the prevalence of frequentist null hypothesis significance testing in the applied literature. Results Most linear models resulted in minimal bias. However, non-linear models and population-weighted versions of classic linear two-way fixed effect and linear GEE models yielded considerable bias (60 to 160%). Further, root mean square error was minimized by linear AR models when we examined crude mortality rates and by negative binomial models when we examined raw death counts. In the context of frequentist hypothesis testing, many models yielded high Type I error rates and very low rates of correctly rejecting the null hypothesis (< 10%), raising concerns of spurious conclusions about policy effectiveness in the opioid literature. When considering performance across models, the linear AR models were optimal in terms of directional bias, root mean squared error, Type I error, and correct rejection rates. Conclusions The findings highlight notable limitations of commonly used statistical models for DID designs, which are widely used in opioid policy studies and in state policy evaluations more broadly. In contrast, the optimal model we identified--the AR model--is rarely used in state policy evaluation. We urge applied researchers to move beyond the classic DID paradigm and adopt use of AR models.


2016 ◽  
Vol 33 (1) ◽  
pp. 105-157 ◽  
Author(s):  
David M. Kaplan ◽  
Yixiao Sun

The moment conditions or estimating equations for instrumental variables quantile regression involve the discontinuous indicator function. We instead use smoothed estimating equations (SEE), with bandwidth h. We show that the mean squared error (MSE) of the vector of the SEE is minimized for some h > 0, leading to smaller asymptotic MSE of the estimating equations and associated parameter estimators. The same MSE-optimal h also minimizes the higher-order type I error of a SEE-based χ2 test and increases size-adjusted power in large samples. Computation of the SEE estimator also becomes simpler and more reliable, especially with (more) endogenous regressors. Monte Carlo simulations demonstrate all of these superior properties in finite samples, and we apply our estimator to JTPA data. Smoothing the estimating equations is not just a technical operation for establishing Edgeworth expansions and bootstrap refinements; it also brings the real benefits of having more precise estimators and more powerful tests.


2014 ◽  
Vol 53 (06) ◽  
pp. 501-510 ◽  
Author(s):  
R.-D. Hilgers ◽  
M. Tamm

SummaryBackground: In clinical trials patients are commonly recruited sequentially over time incurring the risk of chronological bias due to (unobserved) time trends. To minimize the risk of chronological bias, a suitable randomization procedure should be chosen.Objectives: Considering different time trend scenarios, we aim at a detailed evaluation of the extent of chronological bias under permuted block randomization in order to provide recommendations regarding the choice of randomization at the design stage of a clinical trial and to assess the maximum extent of bias for a realized sequence in the analysis stage.Methods: For the assessment of chronological bias we consider linear, logarithmic and stepwise trends illustrating typical changes during recruitment in clinical practice. Bias and variance of the treatment effect estimator as well as the empirical type I error rate when applying the t-test are investigated. Different sample sizes, block sizes and strengths of time trends are considered.Results: Using large block sizes, a notable bias exists in the estimate of the treatment effect for specific sequences. This results in a heavily inflated type I error for realized worst-case sequences and an enlarged mean squared error of the treatment effect estimator. Decreasing the block size restricts these effects of time trends. Already applying permuted block randomization with two blocks instead of the random allocation rule achieves a good reduction of the mean squared error and of the inflated type I error. Averaged over all sequences, the type I error of the t-test is far below the nominal significance level due to an overestimated variance.Conclusions: Unobserved time trends can induce a strong bias in the treatment effect estimate and in the test decision. Therefore, already in the design stage of a clinical trial a suitable randomization procedure should be chosen. According to our results, small block sizes should be preferred, but also medium block sizes are sufficient to restrict chronological bias to an acceptable extent if other contrary aspects have to be considered (e.g. serious risk of selection bias). Regardless of the block size, a blocked ANOVA should be used because the t-test is far too conservative, even for weak time trends.


2014 ◽  
Author(s):  
Zahayu Md Yusof ◽  
Sharipah Soaad Syed Yahaya ◽  
Suhaida Abdullah

This monograph presents the work on robust procedures when researchers faced with data that appear to violate the assumption of normality and the data with unbalanced design.A simulation method was conducted by the authors to compare the robustness (Type I error) of the method with respect to its counterpart from the parametric and non-parametric aspects namely ANOVA, t-test, Kruskal-Wallis and Mann-Whitney respectively. The performance of the methods was further demonstrated on real education data.This monograph illustrates new alternative procedures to researchers (in various fields, especially the experimental sciences) which will not be constrained with all the assumptions such as normality and homogeneity of variances.They can instead work with the original data without having to worry about the shape of the distributions.


2000 ◽  
Vol 14 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Joni Kettunen ◽  
Niklas Ravaja ◽  
Liisa Keltikangas-Järvinen

Abstract We examined the use of smoothing to enhance the detection of response coupling from the activity of different response systems. Three different types of moving average smoothers were applied to both simulated interbeat interval (IBI) and electrodermal activity (EDA) time series and to empirical IBI, EDA, and facial electromyography time series. The results indicated that progressive smoothing increased the efficiency of the detection of response coupling but did not increase the probability of Type I error. The power of the smoothing methods depended on the response characteristics. The benefits and use of the smoothing methods to extract information from psychophysiological time series are discussed.


Methodology ◽  
2012 ◽  
Vol 8 (1) ◽  
pp. 23-38 ◽  
Author(s):  
Manuel C. Voelkle ◽  
Patrick E. McKnight

The use of latent curve models (LCMs) has increased almost exponentially during the last decade. Oftentimes, researchers regard LCM as a “new” method to analyze change with little attention paid to the fact that the technique was originally introduced as an “alternative to standard repeated measures ANOVA and first-order auto-regressive methods” (Meredith & Tisak, 1990, p. 107). In the first part of the paper, this close relationship is reviewed, and it is demonstrated how “traditional” methods, such as the repeated measures ANOVA, and MANOVA, can be formulated as LCMs. Given that latent curve modeling is essentially a large-sample technique, compared to “traditional” finite-sample approaches, the second part of the paper addresses the question to what degree the more flexible LCMs can actually replace some of the older tests by means of a Monte-Carlo simulation. In addition, a structural equation modeling alternative to Mauchly’s (1940) test of sphericity is explored. Although “traditional” methods may be expressed as special cases of more general LCMs, we found the equivalence holds only asymptotically. For practical purposes, however, no approach always outperformed the other alternatives in terms of power and type I error, so the best method to be used depends on the situation. We provide detailed recommendations of when to use which method.


Methodology ◽  
2015 ◽  
Vol 11 (1) ◽  
pp. 3-12 ◽  
Author(s):  
Jochen Ranger ◽  
Jörg-Tobias Kuhn

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.


Sign in / Sign up

Export Citation Format

Share Document