scholarly journals A robust mean and variance test with application to high-dimensional phenotypes

Author(s):  
James R. Staley ◽  
Frank Windmeijer ◽  
Matthew Suderman ◽  
Matthew S. Lyon ◽  
George Davey Smith ◽  
...  

AbstractMost studies of continuous health-related outcomes examine differences in mean levels (location) of the outcome by exposure. However, identifying effects on the variability (scale) of an outcome, and combining tests of mean and variability (location-and-scale), could provide additional insights into biological mechanisms. A joint test could improve power for studies of high-dimensional phenotypes, such as epigenome-wide association studies of DNA methylation at CpG sites. One possible cause of heterogeneity of variance is a variable interacting with exposure in its effect on outcome, so a joint test of mean and variability could help in the identification of effect modifiers. Here, we review a scale test, based on the Brown-Forsythe test, for analysing variability of a continuous outcome with respect to both categorical and continuous exposures, and develop a novel joint location-and-scale score (JLSsc) test. These tests were compared to alternatives in simulations and used to test associations of mean and variability of DNA methylation with gender and gestational age using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES). In simulations, the Brown-Forsythe and JLSsc tests retained correct type I error rates when the outcome was not normally distributed in contrast to the other approaches tested which all had inflated type I error rates. These tests also identified > 7500 CpG sites for which either mean or variability in cord blood methylation differed according to gender or gestational age. The Brown-Forsythe test and JLSsc are robust tests that can be used to detect associations not solely driven by a mean effect.

2018 ◽  
Vol 20 (6) ◽  
pp. 2055-2065 ◽  
Author(s):  
Johannes Brägelmann ◽  
Justo Lorenzo Bermejo

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.


2019 ◽  
Author(s):  
Chong Wu

AbstractMany genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits and may yield inflated Type I error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type I error rates were well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.


2020 ◽  
Author(s):  
Wenjian Bi ◽  
Wei Zhou ◽  
Rounak Dey ◽  
Bhramar Mukherjee ◽  
Joshua N Sampson ◽  
...  

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.


2019 ◽  
Vol 14 (2) ◽  
pp. 399-425 ◽  
Author(s):  
Haolun Shi ◽  
Guosheng Yin

2014 ◽  
Vol 38 (2) ◽  
pp. 109-112 ◽  
Author(s):  
Daniel Furtado Ferreira

Sisvar is a statistical analysis system with a large usage by the scientific community to produce statistical analyses and to produce scientific results and conclusions. The large use of the statistical procedures of Sisvar by the scientific community is due to it being accurate, precise, simple and robust. With many options of analysis, Sisvar has a not so largely used analysis that is the multiple comparison procedures using bootstrap approaches. This paper aims to review this subject and to show some advantages of using Sisvar to perform such analysis to compare treatments means. Tests like Dunnett, Tukey, Student-Newman-Keuls and Scott-Knott are performed alternatively by bootstrap methods and show greater power and better controls of experimentwise type I error rates under non-normal, asymmetric, platykurtic or leptokurtic distributions.


Sign in / Sign up

Export Citation Format

Share Document