scholarly journals Playing Musical Chairs in Big Data to Reveal Variables’ Associations

2016 ◽  
Author(s):  
Hugues Aschard ◽  
Bjarni Vilhjalmsson ◽  
Chirag Patel ◽  
David Skurnik ◽  
Jimmy Yu ◽  
...  

Testing for associations in big data faces the problem of multiple comparisons, with true signals buried inside the noise of all associations queried. This is particularly true in genetic association studies where a substantial proportion of the variation of human phenotypes is driven by numerous genetic variants of small effect. The current strategy to improve power to identify these weak associations consists of applying standard marginal statistical approaches and increasing study sample sizes. While successful, this approach does not leverage the environmental and genetic factors shared between the multiple phenotypes collected in contemporary cohorts. Here we develop a method that improves the power of detecting associations when a large number of correlated variables have been measured on the same samples. Our analyses over real and simulated data provide direct support that large sets of correlated variables can be leveraged to achieve dramatic increases in statistical power equivalent to a two or even three folds increase in sample size.

2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Qihua Tan ◽  
Jing Hua Zhao ◽  
Torben Kruse ◽  
Kaare Christensen

Statistical power is one of the major concerns in genetic association studies. Related individuals such as twins are valuable samples for genetic studies because of their genetic relatedness. Phenotype similarity in twin pairs provides evidence of genetic control over the phenotype variation in a population. The genetic association study on human longevity, a complex trait that is under control of both genetic and environmental factors, has been confronted by the small sample sizes of longevity subjects which limit statistical power. Twin pairs concordant for longevity have increased probability for carrying beneficial genes and thus are useful samples for gene-longevity association analysis. We conducted a computer simulation to estimate the power of association study using longevity concordant twin pairs. We observed remarkable power increases in using singletons from longevity concordant twin pairs as cases in comparison with cases of sporadic proband. A similar power would require doubled sample sizes for fraternal twins than for identical twins who are concordant for longevity suggesting that longevity concordant identical twins are more efficient samples than fraternal twins. We also observed an approximate of 2- to 3-fold increase in sample sizes needed for longevity cutoff at age 90 as compared with that at age 95. Overall, our results showed high value of twins in genetic association studies on human longevity.


2018 ◽  
Author(s):  
Tamar Sofer ◽  
Xiuwen Zheng ◽  
Stephanie M. Gogarten ◽  
Cecelia A. Laurie ◽  
Kelsey Grinde ◽  
...  

AbstractWhen testing genotype-phenotype associations using linear regression, departure of the trait distribution from normality can impact both Type I error rate control and statistical power, with worse consequences for rarer variants. While it has been shown that applying a rank-normalization transformation to trait values before testing may improve these statistical properties, the factor driving them is not the trait distribution itself, but its residual distribution after regression on both covariates and genotype. Because genotype is expected to have a small effect (if any) investigators now routinely use a two-stage method, in which they first regress the trait on covariates, obtain residuals, rank-normalize them, and then secondly use the rank-normalized residuals in association analysis with the genotypes. Potential confounding signals are assumed to be removed at the first stage, so in practice no further adjustment is done in the second stage. Here, we show that this widely-used approach can lead to tests with undesirable statistical properties, due to both a combination of a mis-specified mean-variance relationship, and remaining covariate associations between the rank-normalized residuals and genotypes. We demonstrate these properties theoretically, and also in applications to genome-wide and whole-genome sequencing association studies. We further propose and evaluate an alternative fully-adjusted two-stage approach that adjusts for covariates both when residuals are obtained, and in the subsequent association test. This method can reduce excess Type I errors and improve statistical power.


2019 ◽  
Author(s):  
Sebastian Akle ◽  
Sung Chun ◽  
Athanasios Teodosiadis ◽  
Brian E. Cade ◽  
Heming Wang ◽  
...  

AbstractGenetic association studies of many heritable traits resulting from physiological testing often have modest sample sizes due to the cost and invasiveness of the required phenotyping. This reduces statistical power to discover multiple genetic associations. We present a strategy to leverage pleiotropy between traits to both discover new loci and to provide mechanistic hypotheses of the underlying pathophysiology, using obstructive sleep apnea (OSA) as an exemplar. OSA is a common disorder diagnosed via overnight physiological testing (polysomnography). Here, we leverage pleiotropy with relevant cellular and cardio-metabolic phenotypes and gene expression traits to map new risk loci in an underpowered OSA GWAS. We identify several pleiotropic loci harboring suggestive associations to OSA and genome-wide significant associations to other traits, and show that their OSA association replicates in independent cohorts of diverse ancestries. By investigating pleiotropic loci, our strategy allows proposing new hypotheses about OSA pathobiology across many physiological layers. For example we find links between OSA, a measure of lung function (FEV1/FVC), and an eQTL of desmoplakin (DSP) in lung tissue. We also link a previously known genome-wide significant peak for OSA in the hexokinase (HK1) locus to hematocrit and other red blood cell related traits. Thus, the analysis of pleiotropic associations has the potential to assemble diverse phenotypes into a chain of mechanistic hypotheses that provide insight into the pathogenesis of complex human diseases.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nina Van Goethem ◽  
Célestin Danwang ◽  
Nathalie Bossuyt ◽  
Herman Van Oyen ◽  
Nancy H. C. Roosens ◽  
...  

Abstract Background The severity of influenza disease can range from mild symptoms to severe respiratory failure and can partly be explained by host genetic factors that predisposes the host to severe influenza. Here, we aimed to summarize the current state of evidence that host genetic variants play a role in the susceptibility to severe influenza infection by conducting a systematic review and performing a meta-analysis for all markers with at least three or more data entries. Results A total of 34 primary human genetic association studies were identified that investigated a total of 20 different genes. The only significant pooled ORs were retrieved for the rs12252 polymorphism: an overall OR of 1.52 (95% CI [1.06–2.17]) for the rs12252-C allele compared to the rs12252-T allele. A stratified analysis by ethnicity revealed opposite effects in different populations. Conclusion With exception for the rs12252 polymorphism, we could not identify specific genetic polymorphisms to be associated with severe influenza infection in a pooled meta-analysis. This advocates for the use of large, hypothesis-free, genome-wide association studies that account for the polygenic nature and the interactions with other host, pathogen and environmental factors.


2021 ◽  
Author(s):  
Yann Le Guen ◽  
Michael E. Belloy ◽  
Valerio Napolioni ◽  
Sarah J. Eger ◽  
Gabriel Kennedy ◽  
...  

ABSTRACTIntroductionMany Alzheimer’s disease (AD) genetic association studies disregard age or incorrectly account for it, hampering variant discovery.MethodUsing simulated data, we compared the statistical power of several models: logistic regression on AD diagnosis adjusted and not adjusted for age; linear regression on a score integrating case-control status and age; and multivariate Cox regression on age-at-onset. We applied these models to real exome-wide data of 11,127 sequenced individuals (54% cases) and replicated suggestive associations in 21,631 genotype-imputed individuals (51% cases).ResultsModelling variable AD risk across age results in 10-20% statistical power gain compared to logistic regression without age adjustment, while incorrect age adjustment leads to critical power loss. Applying our novel AD-age score and/or Cox regression, we discovered and replicated novel variants associated with AD on KIF21B, USH2A, RAB10, RIN3 and TAOK2 genes.DiscussionOur AD-age score provides a simple means for statistical power gain and is recommended for future AD studies.


Animals ◽  
2018 ◽  
Vol 8 (12) ◽  
pp. 239 ◽  
Author(s):  
Wengang Zhang ◽  
Xue Gao ◽  
Xinping Shi ◽  
Bo Zhu ◽  
Zezhao Wang ◽  
...  

Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Yann Le Guen ◽  
◽  
Michael E. Belloy ◽  
Valerio Napolioni ◽  
Sarah J. Eger ◽  
...  

Abstract Background Many Alzheimer’s disease (AD) genetic association studies disregard age or incorrectly account for it, hampering variant discovery. Methods Using simulated data, we compared the statistical power of several models: logistic regression on AD diagnosis adjusted and not adjusted for age; linear regression on a score integrating case-control status and age; and multivariate Cox regression on age-at-onset. We applied these models to real exome-wide data of 11,127 sequenced individuals (54% cases) and replicated suggestive associations in 21,631 genotype-imputed individuals (51% cases). Results Modeling variable AD risk across age results in 5–10% statistical power gain compared to logistic regression without age adjustment, while incorrect age adjustment leads to critical power loss. Applying our novel AD-age score and/or Cox regression, we discovered and replicated novel variants associated with AD on KIF21B, USH2A, RAB10, RIN3, and TAOK2 genes. Conclusion Our AD-age score provides a simple means for statistical power gain and is recommended for future AD studies.


Sign in / Sign up

Export Citation Format

Share Document