A robust mean and variance test with application to high-dimensional phenotypes

AbstractMost studies of continuous health-related outcomes examine differences in mean levels (location) of the outcome by exposure. However, identifying effects on the variability (scale) of an outcome, and combining tests of mean and variability (location-and-scale), could provide additional insights into biological mechanisms. A joint test could improve power for studies of high-dimensional phenotypes, such as epigenome-wide association studies of DNA methylation at CpG sites. One possible cause of heterogeneity of variance is a variable interacting with exposure in its effect on outcome, so a joint test of mean and variability could help in the identification of effect modifiers. Here, we review a scale test, based on the Brown-Forsythe test, for analysing variability of a continuous outcome with respect to both categorical and continuous exposures, and develop a novel joint location-and-scale score (JLSsc) test. These tests were compared to alternatives in simulations and used to test associations of mean and variability of DNA methylation with gender and gestational age using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES). In simulations, the Brown-Forsythe and JLSsc tests retained correct type I error rates when the outcome was not normally distributed in contrast to the other approaches tested which all had inflated type I error rates. These tests also identified > 7500 CpG sites for which either mean or variability in cord blood methylation differed according to gender or gestational age. The Brown-Forsythe test and JLSsc are robust tests that can be used to detect associations not solely driven by a mean effect.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080030 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant ◽

Alexander F. Wilson

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets

Briefings in Bioinformatics ◽

10.1093/bib/bby068 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2055-2065 ◽

Cited By ~ 1

Author(s):

Johannes Brägelmann ◽

Justo Lorenzo Bermejo

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Real Data ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Cell Type ◽

Type I Error Rates

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.

Download Full-text

Multi-trait genome-wide analyses of the brain imaging phenotypes in UK Biobank

10.1101/758326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chong Wu

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Type I ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Type I Error Rates ◽

Trait Association ◽

Genome Wide ◽

Inflation Factor

AbstractMany genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits and may yield inflated Type I error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type I error rates were well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.10000871 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Alexander F. Wilson ◽

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

10.1101/2020.10.09.333146 ◽

2020 ◽

Author(s):

Wenjian Bi ◽

Wei Zhou ◽

Rounak Dey ◽

Bhramar Mukherjee ◽

Joshua N Sampson ◽

...

Keyword(s):

Mixed Model ◽

Type I Error ◽

Association Studies ◽

Error Rates ◽

Genome Wide Association ◽

Alternative Methods ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text

Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses

The International Journal of Biostatistics ◽

10.2202/1557-4679.1146 ◽

2009 ◽

Vol 5 (1) ◽

Cited By ~ 65

Author(s):

Peter C Austin

Keyword(s):

Propensity Score ◽

Confidence Intervals ◽

Variance Estimation ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates

Download Full-text

Type I Error Rates for Parscale’s Fit Index

Educational and Psychological Measurement ◽

10.1177/0013164404264849 ◽

2005 ◽

Vol 65 (1) ◽

pp. 42-50 ◽

Cited By ~ 17

Author(s):

Christine E. Demars

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Fit Index

Download Full-text

Control of Type I Error Rates in Bayesian Sequential Designs

Bayesian Analysis ◽

10.1214/18-ba1109 ◽

2019 ◽

Vol 14 (2) ◽

pp. 399-425 ◽

Cited By ~ 4

Author(s):

Haolun Shi ◽

Guosheng Yin

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Sequential Designs ◽

Type I Error Rates

Download Full-text

Sisvar: a Guide for its Bootstrap procedures in multiple comparisons

Ciência e Agrotecnologia ◽

10.1590/s1413-70542014000200001 ◽

2014 ◽

Vol 38 (2) ◽

pp. 109-112 ◽

Cited By ~ 299

Author(s):

Daniel Furtado Ferreira

Keyword(s):

Scientific Community ◽

Type I Error ◽

Multiple Comparisons ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Statistical Analysis System ◽

Scientific Results ◽

Analysis System ◽

Multiple Comparison Procedures

Sisvar is a statistical analysis system with a large usage by the scientific community to produce statistical analyses and to produce scientific results and conclusions. The large use of the statistical procedures of Sisvar by the scientific community is due to it being accurate, precise, simple and robust. With many options of analysis, Sisvar has a not so largely used analysis that is the multiple comparison procedures using bootstrap approaches. This paper aims to review this subject and to show some advantages of using Sisvar to perform such analysis to compare treatments means. Tests like Dunnett, Tukey, Student-Newman-Keuls and Scott-Knott are performed alternatively by bootstrap methods and show greater power and better controls of experimentwise type I error rates under non-normal, asymmetric, platykurtic or leptokurtic distributions.

Download Full-text