scholarly journals On the statistical power of Baarda’s outlier test and some alternative

2017 ◽  
Vol 7 (1) ◽  
Author(s):  
R. Lehmann ◽  
A. Voß-Böhme

AbstractBaarda’s outlier test is one of the best established theories in geodetic practice. The optimal test statistic of the local model test for a single outlier is known as the normalized residual. Also other model disturbances can be detected and identified with this test. It enjoys the property of being a uniformly most powerful invariant (UMPI) test, but is not a uniformly most powerful (UMP) test. In this contribution we will prove that in the class of test statistics following a common central or non-central χ

2019 ◽  
Vol 2019 (3) ◽  
pp. 310-330 ◽  
Author(s):  
Marika Swanberg ◽  
Ira Globus-Harris ◽  
Iris Griffith ◽  
Anna Ritz ◽  
Adam Groce ◽  
...  

Abstract Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.


2021 ◽  
Author(s):  
Meida Wang ◽  
Shuanglin Zhang ◽  
Qiuying Sha

There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure must be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.


Author(s):  
Anna L Tyler ◽  
Baha El Kassaby ◽  
Georgi Kolishovski ◽  
Jake Emerson ◽  
Ann E Wells ◽  
...  

Abstract It is well understood that variation in relatedness among individuals, or kinship, can lead to false genetic associations. Multiple methods have been developed to adjust for kinship while maintaining power to detect true associations. However, relatively unstudied, are the effects of kinship on genetic interaction test statistics. Here we performed a survey of kinship effects on studies of six commonly used mouse populations. We measured inflation of main effect test statistics, genetic interaction test statistics, and interaction test statistics reparametrized by the Combined Analysis of Pleiotropy and Epistasis (CAPE). We also performed linear mixed model (LMM) kinship corrections using two types of kinship matrix: an overall kinship matrix calculated from the full set of genotyped markers, and a reduced kinship matrix, which left out markers on the chromosome(s) being tested. We found that test statistic inflation varied across populations and was driven largely by linkage disequilibrium. In contrast, there was no observable inflation in the genetic interaction test statistics. CAPE statistics were inflated at a level in between that of the main effects and the interaction effects. The overall kinship matrix overcorrected the inflation of main effect statistics relative to the reduced kinship matrix. The two types of kinship matrices had similar effects on the interaction statistics and CAPE statistics, although the overall kinship matrix trended toward a more severe correction. In conclusion, we recommend using a LMM kinship correction for both main effects and genetic interactions and further recommend that the kinship matrix be calculated from a reduced set of markers in which the chromosomes being tested are omitted from the calculation. This is particularly important in populations with substantial population structure, such as recombinant inbred lines in which genomic replicates are used.


Author(s):  
Lingtao Kong

The exponential distribution has been widely used in engineering, social and biological sciences. In this paper, we propose a new goodness-of-fit test for fuzzy exponentiality using α-pessimistic value. The test statistics is established based on Kullback-Leibler information. By using Monte Carlo method, we obtain the empirical critical points of the test statistic at four different significant levels. To evaluate the performance of the proposed test, we compare it with four commonly used tests through some simulations. Experimental studies show that the proposed test has higher power than other tests in most cases. In particular, for the uniform and linear failure rate alternatives, our method has the best performance. A real data example is investigated to show the application of our test.


2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


2006 ◽  
Vol 45 (9) ◽  
pp. 1181-1189 ◽  
Author(s):  
D. S. Wilks

Abstract The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.


2021 ◽  
Vol 20 (2) ◽  
pp. 51-60
Author(s):  
A.O. Abidoye ◽  
W.A. Lamidi ◽  
M.O. Alabi ◽  
J. Popoola

In this paper, we are interested in comparing the conventional t –test with the proposed t – test for testing equality of means with unequal and equal variances. Here, we proposed harmonic mean of variances as an alternative to the pooled sample variance when there is heterogeneity of variances. Two sets of secondary data were obtained from Agricultural Development Project (KWADP) and the Ministry of Agriculture in Ilorin, Kwara State to demonstrate the two test statistics used and the results show that the proposed t – test statistic is found to be appropriate than the conventional t – test statistic when we have unequal variances but the conventional t – test perform better when we have equal variances.


2019 ◽  
Vol 27 (3) ◽  
pp. 281-301 ◽  
Author(s):  
Clayton Webb ◽  
Suzanna Linn ◽  
Matthew Lebo

Pesaran, Shin, and Smith (2001) (PSS) proposed a bounds procedure for testing for the existence of long run cointegrating relationships between a unit root dependent variable ($y_{t}$) and a set of weakly exogenous regressors $\boldsymbol{x}_{t}$ when the analyst does not know whether the independent variables are stationary, unit root, or mutually cointegrated processes. This procedure recognizes the analyst’s uncertainty over the nature of the regressors but not the dependent variable. When the analyst is uncertain whether $y_{t}$ is a stationary or unit root process, the test statistics proposed by PSS are uninformative for inference on the existence of a long run relationship (LRR) between $y_{t}$ and $\boldsymbol{x}_{t}$. We propose the long run multiplier (LRM) test statistic as a means of testing for LRRs without knowing whether the series are stationary or unit roots. Using stochastic simulations, we demonstrate the behavior of the test statistic given uncertainty about the univariate dynamics of both $y_{t}$ and $\boldsymbol{x}_{t}$, illustrate the bounds of the test statistic, and generate small sample and approximate asymptotic critical values for the upper and lower bounds for a range of sample sizes and model specifications. We demonstrate the utility of the bounds framework for testing for LRRs in models of public policy mood and presidential success.


1999 ◽  
Vol 1 (2) ◽  
pp. 83-91 ◽  
Author(s):  
E. M. C. MICHIELS ◽  
E. OUSSOREN ◽  
M. VAN GROENIGEN ◽  
E. PAUWS ◽  
P. M. M. BOSSUYT ◽  
...  

Michiels, E. M. C., E. Oussoren, M. van Groenigen, E. Pauws, P. M. M. Bossuyt, P. A. Voûte, and F. Baas. Genes differentially expressed in medulloblastoma and fetal brain. Physiol. Genomics 1: 83–91, 1999.—Serial analysis of gene expression (SAGE) was used to identify genes that might be involved in the development or growth of medulloblastoma, a childhood brain tumor. Sequence tags from medulloblastoma (10229) and fetal brain (10692) were determined. The distributions of sequence tags in each population were compared, and for each sequence tag, pairwise χ2 test statistics were calculated. Northern blot was used to confirm some of the results obtained by SAGE. For 16 tags, the χ2 test statistic was associated with a P value < 10−4. Among those transcripts with a higher expression in medulloblastoma were the genes for ZIC1 protein and the OTX2 gene, both of which are expressed in the cerebellar germinal layers. The high expression of these two genes strongly supports the hypothesis that medulloblastoma arises from the germinal layer of the cerebellum. This analysis shows that SAGE can be used as a rapid differential screening procedure.


Sankhya B ◽  
2016 ◽  
Vol 79 (1) ◽  
pp. 156-169
Author(s):  
Anupam Kundu ◽  
Nabaneet Das ◽  
Sayantan Chakraborty ◽  
Subir Kumar Bhandari

Sign in / Sign up

Export Citation Format

Share Document