type 1 error
Recently Published Documents


TOTAL DOCUMENTS

155
(FIVE YEARS 58)

H-INDEX

17
(FIVE YEARS 3)

Author(s):  
Maxime Cordy ◽  
Sami Lazreg ◽  
Mike Papadakis ◽  
Axel Legay

AbstractWe propose a new Statistical Model Checking (SMC) method to identify bugs in variability-intensive systems (VIS). The state-space of such systems is exponential in the number of variants, which makes the verification problem harder than for classical systems. To reduce verification time, we propose to combine SMC with featured transition systems (FTS)—a model that represents jointly the state spaces of all variants. Our new methods allow the sampling of executions from one or more (potentially all) variants. We investigate their utility in two complementary use cases. The first case considers the problem of finding all variants that violate a given property expressed in Linear-Time Logic (LTL) within a given simulation budget. To achieve this, we perform random walks in the featured transition system seeking accepting lassos. We show that our method allows us to find bugs much faster (up to 16 times according to our experiments) than exhaustive methods. As any simulation-based approach, however, the risk of Type-1 error exists. We provide a lower bound and an upper bound for the number of simulations to perform to achieve the desired level of confidence. Our empirical study involving 59 properties over three case studies reveals that our method manages to discover all variants violating 41 of the properties. This indicates that SMC can act as a coarse-grained analysis method to quickly identify the set of buggy variants. The second case complements the first one. In case the coarse-grained analysis reveals that no variant can guarantee to satisfy an intended property in all their executions, one should identify the variant that minimizes the probability of violating this property. Thus, we propose a fine-grained SMC method that quickly identifies promising variants and accurately estimates their violation probability. We evaluate different selection strategies and reveal that a genetic algorithm combined with elitist selection yields the best results.


2021 ◽  
Author(s):  
Shing Wan Choi ◽  
Timothy Shin Heng Mak ◽  
Clive J. Hoggart ◽  
Paul F. O'Reilly

Background: Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work. Results: Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification. Conclusion: The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.


2021 ◽  
Author(s):  
Daniel Lakens

Psychological science would become more efficient if researchers implemented sequential designs where feasible. Miller and Ulrich (2020) propose an independent segments procedure where data can be analyzed at a prespecified number of equally spaced looks while controlling the Type 1 error rate. Such procedures already exist in the sequential analysis literature, and in this commentary I reflect on whether psychologist should choose to adopt these existing procedure instead. I believe limitations in the independent segments procedure make it relatively unattractive. Being forced to stop for futility based on a bound not chosen to control Type 2 errors, or reject a smallest effect size of interest in an equivalence test, limit the inferences one can make. Having to use a prespecified number of equally spaced looks is logistically inconvenient. And not having the flexibility to choose α and β spending functions limit the possibility to design efficient studies based on the goal and limitations of the researcher. Recent software packages such as rpact (Wassmer & Pahlke, 2019) make sequential designs equally easy to perform as the independent segments procedure. While learning new statistical methods always takes time, I believe psychological scientists should start on a path that will not limit them in the flexibility and inferences their statistical procedure provides.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xintong Li ◽  
Lana YH Lai ◽  
Anna Ostropolets ◽  
Faaizah Arshad ◽  
Eng Hooi Tan ◽  
...  

Using real-world data and past vaccination data, we conducted a large-scale experiment to quantify bias, precision and timeliness of different study designs to estimate historical background (expected) compared to post-vaccination (observed) rates of safety events for several vaccines. We used negative (not causally related) and positive control outcomes. The latter were synthetically generated true safety signals with incident rate ratios ranging from 1.5 to 4. Observed vs. expected analysis using within-database historical background rates is a sensitive but unspecific method for the identification of potential vaccine safety signals. Despite good discrimination, most analyses showed a tendency to overestimate risks, with 20%-100% type 1 error, but low (0% to 20%) type 2 error in the large databases included in our study. Efforts to improve the comparability of background and post-vaccine rates, including age-sex adjustment and anchoring background rates around a visit, reduced type 1 error and improved precision but residual systematic error persisted. Additionally, empirical calibration dramatically reduced type 1 to nominal but came at the cost of increasing type 2 error.


2021 ◽  
pp. 90-120
Author(s):  
Charles Auerbach

This chapter covers tests of statistical significance that can be used to compare data across phases. These are used to determine whether observed outcomes are likely the result of an intervention or, more likely, the result of sampling error or chance. The purpose of a statistical test is to determine how likely it is that the analyst is making an incorrect decision by rejecting the null hypothesis, that there is no difference between compared phases, and accepting the alternative one, that true differences exist. A number of tests of significance are presented in this chapter: statistical process control charts (SPCs), proportion/frequency, chi-square, the conservative dual criteria (CDC), robust conservative dual criteria (RCDC), the t test, and analysis of variance (ANOVA). How and when to use each of these are also discussed, and examples are provided to illustrate each. The method for transforming autocorrelated data and merging data sets is discussed further in the context of utilizing transformed data sets to test of Type 1 error.


2021 ◽  
Author(s):  
Masahiro Kojima

Abstract Background: A confirmation of dose-response is complicated by the need to adjust for multiplicity. We propose a simple and powerful adaptive contrast test with ordinal constraint contrast coefficients determined by observed responses.Methods: The adaptive contrast test can perform using easily calculated contrast coefficients and existing statistical software. We provide the sample SAS program codes of analysis and calculation of power for the adaptive contrast test. After the adaptive contrast test shows the statistically significant dose-response, we consider to select the best dose-response model from multiple dose-response models. Based on the best model, we identify a recommended dose. We demonstrate the adaptive contrast test for sample data. In addition, we show the calculation of coefficient, test statistic, and recommended dose for the actual study. We perform the simulation study with eleven scenarios to evaluate the performance of the adaptive contrast test.Results: We confirmed the statistically significant dose-response for the sample data and the actual study. In the simulation study, we confirmed that the adaptive contrast test has higher power in most scenarios compared to the conventional method. In addition, we confirmed that the type 1 error rate of the adaptive contrast test was maintained at a significance level when there was no difference between the treatment groups.Conclusions: We conclude that the adaptive contrast test can be applied unproblematically to the dose-response study.


2021 ◽  
Vol 18 (5) ◽  
pp. 521-528
Author(s):  
Eric S Leifer ◽  
James F Troendle ◽  
Alexis Kolecki ◽  
Dean A Follmann

Background/aims: The two-by-two factorial design randomizes participants to receive treatment A alone, treatment B alone, both treatments A and B( AB), or neither treatment ( C). When the combined effect of A and B is less than the sum of the A and B effects, called a subadditive interaction, there can be low power to detect the A effect using an overall test, that is, factorial analysis, which compares the A and AB groups to the C and B groups. Such an interaction may have occurred in the Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD BP) which simultaneously randomized participants to receive intensive or standard blood pressure, control and intensive or standard glycemic control. For the primary outcome of major cardiovascular event, the overall test for efficacy of intensive blood pressure control was nonsignificant. In such an instance, simple effect tests of A versus C and B versus C may be useful since they are not affected by a subadditive interaction, but they can have lower power since they use half the participants of the overall trial. We investigate multiple testing procedures which exploit the overall tests’ sample size advantage and the simple tests’ robustness to a potential interaction. Methods: In the time-to-event setting, we use the stratified and ordinary logrank statistics’ asymptotic means to calculate the power of the overall and simple tests under various scenarios. We consider the A and B research questions to be unrelated and allocate 0.05 significance level to each. For each question, we investigate three multiple testing procedures which allocate the type 1 error in different proportions for the overall and simple effects as well as the AB effect. The Equal Allocation 3 procedure allocates equal type 1 error to each of the three effects, the Proportional Allocation 2 procedure allocates 2/3 of the type 1 error to the overall A (respectively, B) effect and the remaining type 1 error to the AB effect, and the Equal Allocation 2 procedure allocates equal amounts to the simple A (respectively, B) and AB effects. These procedures are applied to ACCORD BP. Results: Across various scenarios, Equal Allocation 3 had robust power for detecting a true effect. For ACCORD BP, all three procedures would have detected a benefit of intensive glycemia control. Conclusions: When there is no interaction, Equal Allocation 3 has less power than a factorial analysis. However, Equal Allocation 3 often has greater power when there is an interaction. The R package factorial2x2 can be used to explore the power gain or loss for different scenarios.


2021 ◽  
Author(s):  
Christiaan de Leeuw ◽  
Josefin Werme ◽  
Jeanne Savage ◽  
Wouter J Peyrot ◽  
Danielle Posthuma

Transcriptome-wide association studies (TWAS), which aim to detect relationships between gene expression and a phenotype, are commonly used for secondary analysis of genome-wide association study (GWAS) results. Results of TWAS analyses are often interpreted as indicating a genetically mediated relationship between gene expression and the phenotype, but because the traditional TWAS framework does not model the uncertainty in the expression quantitative trait loci (eQTL) effect estimates, this interpretation is not justified. In this study we outline the implications of this issue. Using simulations, we show severely inflated type 1 error rates for TWAS when evaluating a null hypothesis of no genetic relationship between gene expression and the phenotype. Moreover, in our application to real data only 51% of the TWAS associations were confirmed with local genetic correlation analysis, an approach which correctly evaluates the same null. Our results thus demonstrate that TWAS is unsuitable for investigating genetic relationships between gene expression and a phenotype.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Cleiton G. Taufemback ◽  
Victor Troster ◽  
Muhammad Shahbaz

Abstract In this paper, we propose a robust test of monotonicity in asset returns that is valid under a general setting. We develop a test that allows for dependent data and is robust to conditional heteroskedasticity or heavy-tailed distributions of return differentials. Many postulated theories in economics and finance assume monotonic relationships between expected asset returns and certain underlying characteristics of an asset. Existing tests in literature fail to control the probability of a type 1 error or have low power under heavy-tailed distributions of return differentials. Monte Carlo simulations illustrate that our test statistic has a correct empirical size under all data-generating processes together with a similar power to other tests. Conversely, alternative tests are nonconservative under conditional heteroskedasticity or heavy-tailed distributions of return differentials. We also present an empirical application on the monotonicity of returns on various portfolios sorts that highlights the usefulness of our approach.


2021 ◽  
Author(s):  
Martijn J Schuemie ◽  
Faaizah Arshad ◽  
Nicole Pratt ◽  
Fredrik Nyberg ◽  
Thamir M Alshammari ◽  
...  

Background: Routinely collected healthcare data such as administrative claims and electronic health records (EHR) can complement clinical trials and spontaneous reports when ensuring the safety of vaccines, but uncertainty remains about what epidemiological design to use. Methods: Using 3 claims and 1 EHR database, we evaluate several variants of the case-control, comparative cohort, historical comparator, and self-controlled designs against historical vaccinations with real negative control outcomes (outcomes with no evidence to suggest that they could be caused by the vaccines) and simulated positive controls. Results: Most methods show large type 1 error, often identifying false positive signals. The cohort method appears either positively or negatively biased, depending on the choice of comparator index date. Empirical calibration using effect-size estimates for negative control outcomes can restore type 1 error to close to nominal, often at the cost of increasing type 2 error. After calibration, the self-controlled case series (SCCS) design shows the shortest time to detection for small true effect sizes, while the historical comparator performs well for strong effects. Conclusions: When applying any method for vaccine safety surveillance we recommend considering the potential for systematic error, especially due to confounding, which for many designs appears to be substantial. Adjusting for age and sex alone is likely not sufficient to address the differences between vaccinated and unvaccinated, and for the cohort method the choice of index date plays an important role in the comparability of the groups Inclusion of negative control outcomes allows both quantification of the systematic error and, if so desired, subsequent empirical calibration to restore type 1 error to its nominal value. In order to detect weaker signals, one may have to accept a higher type 1 error.


Sign in / Sign up

Export Citation Format

Share Document