scholarly journals Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Li-Chu Chien

In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.

2019 ◽  
Vol 21 (3) ◽  
pp. 753-761 ◽  
Author(s):  
Regina Brinster ◽  
Dominique Scherer ◽  
Justo Lorenzo Bermejo

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.


2018 ◽  
Vol 20 (6) ◽  
pp. 2055-2065 ◽  
Author(s):  
Johannes Brägelmann ◽  
Justo Lorenzo Bermejo

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.


2017 ◽  
Vol 284 (1851) ◽  
pp. 20161850 ◽  
Author(s):  
Nick Colegrave ◽  
Graeme D. Ruxton

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.


2017 ◽  
Vol 78 (3) ◽  
pp. 460-481 ◽  
Author(s):  
Margarita Olivera-Aguilar ◽  
Samuel H. Rikoon ◽  
Oscar Gonzalez ◽  
Yasemin Kisbu-Sakarya ◽  
David P. MacKinnon

When testing a statistical mediation model, it is assumed that factorial measurement invariance holds for the mediating construct across levels of the independent variable X. The consequences of failing to address the violations of measurement invariance in mediation models are largely unknown. The purpose of the present study was to systematically examine the impact of mediator noninvariance on the Type I error rates, statistical power, and relative bias in parameter estimates of the mediated effect in the single mediator model. The results of a large simulation study indicated that, in general, the mediated effect was robust to violations of invariance in loadings. In contrast, most conditions with violations of intercept invariance exhibited severely positively biased mediated effects, Type I error rates above acceptable levels, and statistical power larger than in the invariant conditions. The implications of these results are discussed and recommendations are offered.


1984 ◽  
Vol 4 (1) ◽  
pp. 37-50 ◽  
Author(s):  
Kenneth Ottenbacher

Research in the behavioral and social sciences including occupational therapy has been shown to be associated with low statistical power and a high rate of Type II experimental errors. Three methods of increasing power that are frequently suggested are increasing sample size, increasing effect size, and increasing the significance level The first two alternatives are often not possible in applied fields such as occupational therapy, and the third is generally not considered desirable since it leads to increased Type I error rates. A fourth alternative is proposed, which involves the partitioning of the decision region into three sections. This procedure is based on the Neyman and Pearson (1933) decision-theory approach to significance testing and is particularly applicable to areas of applied and clinical investigation such as occupational therapy. A sample power table is presented along with formulas to compute table values. The argument is made that using the procedures described will provide a method of unambiguously interpreting nonsignificant results and increase the power and sensitivity of occupational therapy research.


2017 ◽  
Vol 88 (4) ◽  
pp. 769-784
Author(s):  
Falynn C. Turley ◽  
David Redden ◽  
Janice L. Case ◽  
Charles Katholi ◽  
Jeff Szychowski ◽  
...  

2016 ◽  
Author(s):  
Daijiang Li ◽  
Anthony R Ives

1. A growing number of studies incorporate functional trait information to analyse patterns and processes of community assembly. These studies of trait-environment relationships generally ignore phylogenetic relationships among species. When functional traits and the residual variation in species distributions among communities have phylogenetic signal, however, analyses ignoring phylogenetic relationships can decrease estimation accuracy and power, inflate type I error rates, and lead to potentially false conclusions. 2. Using simulations, we compared estimation accuracy, statistical power, and type I error rates of linear mixed models (LMM) and phylogenetic linear mixed models (PLMM) designed to test for trait-environment interactions in the distribution of species abundances among sites. We considered the consequences of both phylogenetic signal in traits and phylogenetic signal in the residual variation of species distributions generated by an unmeasured (latent) trait with phylogenetic signal. 3. When there was phylogenetic signal in the residual variation of species among sites, PLMM provided better estimates (closer to the true value) and greater statistical power for testing whether the trait-environment interaction regression coefficient differed from zero. LMM had unacceptably high type I error rates when there was phylogenetic signal in both traits and the residual variation in species distributions. When there was no phylogenetic signal in the residual variation in species distributions, LMM and PLMM had similar performances. 4. LMMs that ignore phylogenetic relationships can lead to poor statistical tests of trait-environment relationships when there is phylogenetic signal in the residual variation of species distributions among sites, such as caused by unmeasured traits. Therefore, phylogenies and PLMMs should be used when studying how functional traits affect species abundances among communities in response to environmental gradients.


Sign in / Sign up

Export Citation Format

Share Document