Accuracy of p-values of approximate tests in testing for equality of means under unequal variances

2009 ◽  
Vol 59 (6) ◽  
Author(s):  
Júlia Volaufová

AbstractSeemingly, testing for fixed effects in linear models with variance-covariance components has been solved for decades. However, even in simple situations such as in fixed one-way model with heteroscedastic variances (a multiple means case of the Behrens-Fisher problem) the questions of statistical properties of various approximations of test statistics are still alive. Here we present a brief overview of several approaches suggested in the literature as well as those available in statistical software, accompanied by a simulation study in which the accuracy of p-values is studied. Our interest is limited here to the Welch’s test, the Satterthwaite-Fai-Cornelius test, the Kenward-Roger test, the simple ANOVA F-test, and the parametric bootstrap test. We conclude that for small sample sizes, regardless the number of compared means and the heterogeneity of variance, the ANOVA F-test p-value performs the best. For higher sample sizes (at least 5 per group), the parametric bootstrap performs well, and the Kenward-Roger test also performs well.

1993 ◽  
Vol 18 (1) ◽  
pp. 1-40 ◽  
Author(s):  
Robert J. Boik

This article considers two related issues concerning the analysis of interactions in complex linear models. The first issue concerns the omnibus test for interaction. Apparently, it is not well known that the usual F test for interaction can be replaced, in many applications, by a test that is more powerful against a certain class of alternatives. The competing test is based on the maximal product interaction contrast F statistic and achieves its power advantage by focusing solely on product contrasts. The maximal product interaction F test is reviewed and three new results are reported: (a) An extended table of exact critical values is computed, (b) a table of moment functions useful for approximating the p-value corresponding to an observed maximal F statistic is computed, and (c) a simulation study concerning the null distribution of the maximal F statistic when data are unbalanced or covariates are present is reported. It is conjectured that lack of balance or presence of covariates has no effect on the null distribution. The simulation results support the conjecture. The second issue concerns follow-up tests when the omnibus test is significant. It appears that researchers, in general, do not perform coherent follow-up tests on interactions. To make it easier for researchers to do so, an exposition on the use of product interaction contrasts and partial interactions in complex fixed-effects models is provided. The recommended omnibus and follow-up tests are illustrated on an educational data set analyzed using SAS ( SAS Institute, 1988 ) and SPSS (1990) .


2017 ◽  
Author(s):  
Gregory Connor ◽  
Michael O’Neill

AbstractThis paper derives the exact finite-sample p-value for univariate regression of a quantitative phenotype on individual genome markers, relying on a mixture distribution for the dependent variable. The p-value estimator conventionally used in existing genome-wide association study (GWAS) regressions assumes a normally-distributed dependent variable, or relies on a central limit theorem based approximation. The central limit theorem approximation is unreliable for GWAS regression p-values, and measured phenotypes often have markedly non-normal distributions. A normal mixture distribution better fits observed phenotypic variables, and we provide exact small-sample p-values for univariate GWAS regressions under this flexible distributional assumption. We illustrate the adjustment using a years-of-education phenotypic variable.


2019 ◽  
Author(s):  
J.M. Gorriz ◽  
◽  
◽  

ABSTRACTIn the 70s a novel branch of statistics emerged focusing its effort in selecting a function in the pattern recognition problem, which fulfils a definite relationship between the quality of the approximation and its complexity. These data-driven approaches are mainly devoted to problems of estimating dependencies with limited sample sizes and comprise all the empirical out-of sample generalization approaches, e.g. cross validation (CV) approaches. Although the latter are not designed for testing competing hypothesis or comparing different models in neuroimaging, there are a number of theoretical developments within this theory which could be employed to derive a Statistical Agnostic (non-parametric) Mapping (SAM) at voxel or multi-voxel level. Moreover, SAMs could relieve i) the problem of instability in limited sample sizes when estimating the actual risk via the CV approaches, e.g. large error bars, and provide ii) an alternative way of Family-wise-error (FWE) corrected p-value maps in inferential statistics for hypothesis testing. In this sense, we propose a novel framework in neuroimaging based on concentration inequalities, which results in (i) a rigorous development for model validation with a small sample/dimension ratio, and (ii) a less-conservative procedure than FWE p-value correction, to determine the brain significance maps from the inferences made using small upper bounds of the actual risk.


2019 ◽  
Vol 49 (03) ◽  
pp. 763-786 ◽  
Author(s):  
Patrizia Gigante ◽  
Liviana Picech ◽  
Luciano Sigalotti

AbstractClaims reserving models are usually based on data recorded in run-off tables, according to the origin and the development years of the payments. The amounts on the same diagonal are paid in the same calendar year and are influenced by some common effects, for example, claims inflation, that can induce dependence among payments. We introduce hierarchical generalized linear models (HGLM) with risk parameters related to the origin and the calendar years, in order to model the dependence among payments of both the same origin year and the same calendar year. Besides the random effects, the linear predictor also includes fixed effects. All the parameters are estimated within the model by the h-likelihood approach. The prediction for the outstanding claims and an approximate formula to evaluate the mean square error of prediction are obtained. Moreover, a parametric bootstrap procedure is delineated to get an estimate of the predictive distribution of the outstanding claims. A Poisson-gamma HGLM with origin and calendar year effects is studied extensively and a numerical example is provided. We find that the estimates of the correlations can be significant for payments in the same calendar year and that the inclusion of calendar effects can determine a remarkable impact on the prediction uncertainty.


1992 ◽  
Vol 17 (4) ◽  
pp. 315-339 ◽  
Author(s):  
Michael R. Harwell ◽  
Elaine N. Rubinstein ◽  
William S. Hayes ◽  
Corley C. Olds

Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 2885-2885
Author(s):  
Jenny N Poynter ◽  
Michaela Richardson ◽  
Erica Langer ◽  
Anthony Hooten ◽  
Michelle A. Roesler ◽  
...  

Abstract Background Polymorphisms in mitochondrial DNA can be used to group individuals into haplogroups that reflect human global migration. These mitochondrial variants are associated with differences in mitochondrial function and have been associated with multiple diseases, including cancer. In this analysis, we evaluated the association between mtDNA haplogroup and risk of myelodysplastic syndromes (MDS). Methods Cases were identified by rapid case ascertainment through the population-based Minnesota Cancer Surveillance System (MCSS). Participants were recruited to the MDS study if they were diagnosed with MDS between April 1, 2010 and October 31, 2014. Eligibility criteria included residence in Minnesota, age at diagnosis between 20 and 85 years, and ability to understand English or Spanish. Centralized pathology and cytogenetics review were conducted to confirm diagnosis and classify by subtypes. Controls were identified through the Minnesota State driver's license/identification card list. Genomic DNA from cases and controls was collected using Oragene DNA collection kits (DNA Genotek, Ontario, Canada) and extracted via Autopure LS Instrument according to manufacturer's instructions (Qiagen). We genotyped 15 mtSNPs that capture common European mitochondrial haplogroup variation (Mitchell et al Hum Genet 2014; Raby et al J Allergy Clin Immunol 2007) on the Sequenom iPLEX Gold MassArray platform (Sequenom, Inc., San Diego, CA) in the University of Minnesota Genomics Core. Because haplogroup frequencies vary by race and ethnicity, we restricted analyses to non-Hispanic white cases and controls. All statistical analyses were conducted using SAS v.9.3 (SAS Institute, Cary, NC). Odds ratios (OR) and 95% confidence intervals (CI) were calculated. We also evaluated associations by MDS subtype and IPSS-R risk category. Results We were able to classify 215 cases with confirmed MDS and 522 controls into one of the 11 common European haplogroups. The distribution of haplogroups in our control sample was similar to the distribution reported in a previous sample of non-Hispanic white individuals from the United States (Mitchell et al Hum Genet 2014), with the highest number in the H haplogroup (42%). Due to small sample sizes in some subgroups, we combined mt haplogroups into larger bins based on the haplogroup evolutionary tree, including HV (H+V), JT (J+T), IWX (I+W+X), UK (U+K), and Z (van Oven & Kayser Hum Mut 2009) for comparisons of cases and controls. Using haplogroup HV as the reference group, we found a statistically significant association between haplogroup JT and MDS (OR=0.57, 95% CI 0.36, 0.90, p=0.02). No other significant associations were observed in a comparison of cases and controls (Figure). In the analysis stratified by MDS subtype, the association with haplogroup JT reached statistical significance only in MDS cases with the RCMD subtype (OR=0.42, 95% CI 0.18, 0.97), although the association was similar in magnitude for RARS and the p-value for heterogeneity was non-significant (0.76). Similarly, the associations between haplogroup JT and MDS were similar in the analysis stratified by IPSS-R risk category (p-value for heterogeneity = 0.71). Conclusions In this population-based study of MDS, we observed an association between mtDNA haplogroup JT and risk of MDS. Previous studies using cybrid cells have reported functional differences by mtDNA haplogroup and provide biological plausibility for the observed association, including higher capacity to cope with oxidative stress in haplogroup T (Meuller et al PLoS One 2012) and lower levels of ATP and reactive oxygen species production in haplogroup J (Kenney et al PLoS One 2013). Further studies of the relationship between mtDNA variation and MDS are warranted in larger sample sizes. Figure 1. Association between mtDNA haplogroup and MDS Figure 1. Association between mtDNA haplogroup and MDS Disclosures No relevant conflicts of interest to declare.


1989 ◽  
Vol 14 (3) ◽  
pp. 269-278 ◽  
Author(s):  
Rand R. Wilcox

Numerous papers have shown that the conventional F test is not robust to unequal variances in the one-way fixed effects ANOVA model, and several methods have been proposed for dealing with this problem. Here I describe and compare two methods for handling unequal variances in the two-way fixed effects ANOVA model. One is based on an improved Wilcox (1988) method for the one-way model, which forms the basis for considering this method in the two-way ANOVA model. The other is an extension of James’s (1951) second order method.


1977 ◽  
Vol 14 (4) ◽  
pp. 493-498 ◽  
Author(s):  
Joanne C. Rogan ◽  
H. J. Keselman

Numerous investigations have examined the effects of variance heterogeneity on the empirical probability of a Type I error for the analysis of variance (ANOVA) F-test and the prevailing conclusion has been that when sample sizes are equal, the ANOVA is robust to variance heterogeneity. However, Box (1954) reported a Type I error rate of .12, for a 5% nominal level, when unequal variances were paired with equal sample sizes. The present paper explored this finding, examining varying degrees and patterns of variance heterogeneity for varying sample sizes and number of treatment groups. The data indicate that the rate of Type 1 error varies as a function of the degree of variance heterogeneity and, consequently, it should not be assumed that the ANOVA F-test is always robust to variance heterogeneity when sample sizes are equal.


2009 ◽  
Vol 40 (4) ◽  
pp. 415-427 ◽  
Author(s):  
Lee-Shen Chen ◽  
Ming-Chung Yang

This article considers the problem of testing marginal homogeneity in $2 \times 2$ contingency tables under the multinomial sampling scheme. From the frequentist perspective, McNemar's exact $p$-value ($p_{_{\textsl ME}}$) is the most commonly used $p$-value in practice, but it can be conservative for small to moderate sample sizes. On the other hand, from the Bayesian perspective, one can construct Bayesian $p$-values by using the proper prior and posterior distributions, which are called the prior predictive $p$-value ($p_{prior}$) and the posterior predictive $p$-value ($p_{post}$), respectively. Another Bayesian $p$-value is called the partial posterior predictive $p$-value ($p_{ppost}$), first proposed by [2], which can avoid the double use of the data that occurs in $p_{post}$. For the preceding problem, we derive $p_{prior}$, $p_{post}$, and $p_{ppost}$ based on the noninformative uniform prior. Under the criterion of uniformity in the frequentist sense, comparisons among $p_{prior}$, $p_{_{{\textsl ME}}}$, $p_{post}$ and $p_{ppost}$ are given. Numerical results show that $p_{ppost}$ has the best performance for small to moderately large sample sizes.


Sign in / Sign up

Export Citation Format

Share Document