The Power of t-test with large sample size under the different condition of sample size and significance level between real data, transformed data and data from monte carlo simulation technique.

Author(s):  
Natcha Mahapoonyanont ◽  
Suwichaya Putuptim

The power of test is the probability that the test rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true. The probability of occurrence of a type I error is modelled on medical research that tried to avoid the type I error, such as testing of new medicines, etc. The statistical significance level must be set to be as small as possible, and the probability of type II error would be considered later. In behavioural sciences and social sciences research, the researcher wants to avoid a type I error by determining the level of statistical significance. There are arguments of statistical significance could affect the errors of the findings. Independent variables may have a real influence on the dependent variables but the researcher could not detect them because of statistical significance was setting at the low level. Therefore, in some situations, more attention should be paid to the occurrence of the type II error, and less interest in type I error. This may demonstrate more realistic and valid results. The objectives of this research were to compare of the power of test on t – test under the condition of different sample size (n; 30, 60, 90), statistical significance (sig; .001, .01, .05), and type of data (real data, transformed data, simulation data (Monte Carlo Simulation Technique)). The research findings provide significant information for researcher that is useful for further research using t-test, to improve the accuracy of research findings.

1996 ◽  
Vol 1 (1) ◽  
pp. 25-28 ◽  
Author(s):  
Martin A. Weinstock

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.


2011 ◽  
Vol 50 (03) ◽  
pp. 237-243 ◽  
Author(s):  
T. Friede ◽  
M. Kieser

SummaryObjectives: Analysis of covariance (ANCOVA) is widely applied in practice and its use is recommended by regulatory guidelines. However, the required sample size for ANCOVA depends on parameters that are usually uncertain in the planning phase of a study. Sample size recalculation within the internal pilot study design allows to cope with this problem. From a regulatory viewpoint it is preferable that the treatment group allocation remains masked and that the type I error is controlled at the specified significance level. The characteristics of blinded sample size reassessment for ANCOVA in non-inferiority studies have not been investigated yet. We propose an appropriate method and evaluate its performance.Methods: In a simulation study, the characteristics of the proposed method with respect to type I error rate, power and sample size are investigated. It is illustrated by a clinical trial example how strict control of the significance level can be achieved.Results: A slight excess of the type I error rate beyond the nominal significance level was observed. The extent of exceedance increases with increasing non-inferiority margin and increasing correlation between outcome and covariate. The procedure assures the desired power over a wide range of scenarios even if nuisance parameters affecting the sample size are initially mis-specified.Conclusions: The proposed blinded sample size recalculation procedure protects from insufficient sample sizes due to incorrect assumptions about nuisance parameters in the planning phase. The original procedure may lead to an elevated type I error rate, but methods are available to control the nominal significance level.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Keisuke Ejima ◽  
Andrew Brown ◽  
Daniel Smith ◽  
Ufuk Beyaztas ◽  
David Allison

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.


2020 ◽  
Vol 45 (6) ◽  
pp. 667-689
Author(s):  
Xi Wang ◽  
Yang Liu

In continuous testing programs, some items are repeatedly used across test administrations, and statistical methods are often used to evaluate whether items become compromised due to examinees’ preknowledge. In this study, we proposed a residual method to detect compromised items when a test can be partitioned into two subsets of items: secure items and possibly compromised items. We derived the standard error of the residual statistic by taking the sampling error in both ability and item parameter estimate into account. The simulation results suggest that the Type I error is close to the nominal level when both sources of error are adjusted, and item parameter error can be ignored only when the item calibration sample size is much larger than the evaluation sample size. We also investigated the performance of the residual method when not using information from secure items in both simulation and real data analyses.


1997 ◽  
Vol 22 (3) ◽  
pp. 349-360 ◽  
Author(s):  
Donald W. Zimmerman

Explanations of advantages and disadvantages of paired-samples experimental designs in textbooks in education and psychology frequently overlook the change in the Type I error probability which occurs when an independent-samples t test is performed on correlated observations. This alteration of the significance level can be extreme even if the correlation is small. By comparison, the loss of power of the paired-samples t test on difference scores due to reduction of degrees of freedom, which typically is emphasized, is relatively slight. Although paired-samples designs are appropriate and widely used when there is a natural correspondence or pairing of scores, researchers have not often considered the implications of undetected correlation between supposedly independent samples in the absence of explicit pairing.


2017 ◽  
Vol 16 (1) ◽  
Author(s):  
Hector Mueses

Summary: Usually health professionals and people with little knowledge of statistics wheninvolved with quantitative research they are faced to make statistical techniques to fulfill thedata analysis resulting from a previous data collection. Generally they state hypothesis and laterthe information analysis can support the evidence in favor or against such hypothesis. In that pointcommonly they are faced to confusion when they try to interpret p value and type I error. Theconcept of p value and significance level will be approached in this paper and the difference amongthem will be cleared. Key words: Type I error. Type II error. P value. Null hypothesis. Statisticalof test.


2021 ◽  
Vol 58 (2) ◽  
pp. 133-147
Author(s):  
Rownak Jahan Tamanna ◽  
M. Iftakhar Alam ◽  
Ahmed Hossain ◽  
Md Hasinur Rahaman Khan

Summary Sample size calculation is an integral part of any clinical trial design, and determining the optimal sample size for a study ensures adequate power to detect statistical significance. It is a critical step in designing a planned research protocol, since using too many participants in a study is expensive, exposing more subjects to the procedure. If a study is underpowered, it will be statistically inconclusive and may cause the whole protocol to fail. Amidst the attempt to maximize power and the underlying effort to minimize the budget, the optimization of both has become a significant issue in the determination of sample size for clinical trials in recent decades. Although it is hard to generalize a single method for sample size calculation, this study is an attempt to offer something that might be a basis for finding a permanent answer to the contradictions of sample size determination, by the use of simulation studies under simple random and cluster sampling schemes, with different sizes of power and type I error. The effective sample size is much higher when the design effect of the sampling method is smaller, particularly less than 1. Sample size increases for cluster sampling when the number of clusters increases.


2018 ◽  
Author(s):  
Alina Peluso ◽  
Robert Glen ◽  
Timothy M D Ebbels

AbstractMotivationA key issue in the omics literature is the search for statistically significant relationships between molecular markers and phenotype. The aim is to detect disease-related discriminatory features while controlling for false positive associations at adequate power. Metabolome-wide association studies have revealed significant relationships of metabolic phenotypes with disease risk by analysing hundreds to tens of thousands of molecular variables leading to multivariate data which are highly noisy and collinear. In this context, conventional Bonferroni or Sidak multiple testing corrections are rather useful as these are valid for independent tests, while permutation procedures allow for the estimation of significance levels from the null distribution without assuming independence among features. Nevertheless, under the permutation approach the distribution of p-values may present systematic deviations from the theoretical null distribution which leads to overly conservative adjusted threshold estimates i.e. smaller than a Bonferroni or Sidak correction.MethodsWe make use of parametric approximation methods based on a multivariate Normal distribution to derive stable estimates of the metabolome-wide significance level. A univariate approach is applied based on a permutation procedure which effectively controls the overall type I error rate at the α level.ResultsWe illustrate the approach for different model parametrizations and distributional features of the outcome measure, using both simulated and real data. We also investigate different levels of correlation within the features and between the features and the outcome.AvailabilityMWSL is an open-source R software package for the empirical estimation of the metabolome-wide significance level available at https://github.com/AlinaPeluso/MWSL.


2021 ◽  
pp. 001316442199489
Author(s):  
Luyao Peng ◽  
Sandip Sinharay

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.


Sign in / Sign up

Export Citation Format

Share Document