scholarly journals Correcting subtle stratification in summary association statistics

2016 ◽  
Author(s):  
Gaurav Bhatia ◽  
Nicholas A. Furlotte ◽  
Po-Ru Loh ◽  
Xuanyao Liu ◽  
Hilary K. Finucane ◽  
...  

AbstractPopulation stratification is a well-documented confounder in GWASes, and is often addressed by including principal component (PC) covariates computed from common SNPs (SNP-PCs). In our analyses of summary statistics from 36 GWASes (mean n=88k), including 20 GWASes using 23andMe data that included SNP-PC covariates, we observed a significantly inflated LD score regression (LDSC) intercept for several traits—suggesting that residual stratification remains a concern, even when SNPPC covariates are included.Here we propose a new method, PC loading regression, to correct for stratification in summary statistics by leveraging SNP loadings for PCs computed in a large reference panel. In addition to SNP-PCs, the method can be applied to haploSNP-PCs, i.e. PCs computed from a larger number of rare haplotype variants that better capture subtle structure. Using simulations based on real genotypes from 54,000 individuals of diverse European ancestry from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort, we show that PC loading regression effectively corrects for stratification along top PCs.We applied PC loading regression to several traits with inflated LDSC intercepts. Correcting for the top four SNP-PCs in GERA data, we observe a significant reduction in LDSC intercept height summary statistics from the Genetic Investigation of ANthropometric Traits (GIANT) consortium, but not for 23andMe summary statistics, which already included SNP-PC covariates. However, when correcting for additional haploSNP-PCs in 23andMe GWASes, inflation in the LDSC intercept was eliminated for eye color, hair color, and skin color and substantially reduced for height (1.41 to 1.16; n=430k). Correcting for haploSNP-PCs in GIANT height summary statistics eliminated inflation in the LDSC intercept (from 1.35 to 1.00; n=250k), eliminating 27 significant association signals including one at the LCT locus, which is highly differentiated among European populations and widely known to produce spurious signals. Overall, our results suggest that uncorrected population stratification is a concern in GWASes of large sample size and that PC loading regression can correct for this stratification.

2020 ◽  
Author(s):  
John E. McGeary ◽  
Chelsie Benca-Bachman ◽  
Victoria Risner ◽  
Christopher G Beevers ◽  
Brandon Gibb ◽  
...  

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.


2017 ◽  
Author(s):  
Ronald de Vlaming ◽  
Magnus Johannesson ◽  
Patrik K.E. Magnusson ◽  
M. Arfan Ikram ◽  
Peter M. Visscher

AbstractLD-score (LDSC) regression disentangles the contribution of polygenic signal, in terms of SNP-based heritability, and population stratification, in terms of a so-called intercept, to GWAS test statistics. Whereas LDSC regression uses summary statistics, methods like Haseman-Elston (HE) regression and genomic-relatedness-matrix (GRM) restricted maximum likelihood infer parameters such as SNP-based heritability from individual-level data directly. Therefore, these two types of methods are typically considered to be profoundly different. Nevertheless, recent work has revealed that LDSC and HE regression yield near-identical SNP-based heritability estimates when confounding stratification is absent. We now extend the equivalence; under the stratification assumed by LDSC regression, we show that the intercept can be estimated from individual-level data by transforming the coefficients of a regression of the phenotype on the leading principal components from the GRM. Using simulations, considering various degrees and forms of population stratification, we find that intercept estimates obtained from individual-level data are nearly equivalent to estimates from LDSC regression (R2> 99%). An empirical application corroborates these findings. Hence, LDSC regression is not profoundly different from methods using individual-level data; parameters that are identified by LDSC regression are also identified by methods using individual-level data. In addition, our results indicate that, under strong stratification, there is misattribution of stratification to the slope of LDSC regression, inflating estimates of SNP-based heritability from LDSC regression ceteris paribus. Hence, the intercept is not a panacea for population stratification. Consequently, LDSC-regression estimates should be interpreted with caution, especially when the intercept estimate is significantly greater than one.


2020 ◽  
Author(s):  
Sagnik Palmal ◽  
Kaustubh Adhikari ◽  
Javier Mendoza-Revilla ◽  
Macarena Fuentes-Guajardo ◽  
Caio C. Silva de Cerqueira ◽  
...  

AbstractWe report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approaches, (ii) The inclusion of non-genetic predictors, continental genetic ancestry and pigmentation SNPs in the prediction models, and (iii) Compared two sets of pigmentation SNPs: the commonly-used HIrisPlex-S set (developed in Europeans) and novel SNP sets we defined here based on genome-wide association results in the CANDELA sample. We find that Random Forest or regression are globally the best performing methods. Although continental genetic ancestry has substantial power for prediction of pigmentation in Latin Americans, the inclusion of pigmentation SNPs increases prediction accuracy considerably, particularly for skin color. For hair and eye color, HIrisPlex-S has a similar performance to the CANDELA-specific prediction SNP sets. However, for skin pigmentation the performance of HIrisPlex-S is markedly lower than the SNP set defined here, including predictions in an independent dataset of Native American data. These results reflect the relatively high variation in hair and eye color among Europeans for whom HIrisPlex-S was developed, whereas their variation in skin pigmentation is comparatively lower. Furthermore, we show that the dataset used in the training of prediction models strongly impacts on the portability of these models across Europeans and Native Americans.


2020 ◽  
Vol 2020 ◽  
pp. 1-5
Author(s):  
Sultan Z. Alasmari ◽  
Nashwa Eisa ◽  
Saeed Mastour Alshahrani ◽  
Mohammad Mahtab Alam ◽  
Prasanna Rajagopalan ◽  
...  

Background. Body mass index (BMI) is a metric widely used to measure the healthy weight of an individual and to predict a person’s risk of developing serious illnesses. Study the statistical association between genetically transmitted traits and BMI might be of interest. Objectives. The present study designed to extend the inadequate evidence concerning the influence of some genetically transmitted traits including ABO blood type, Rh factor, eye color, and hair color on BMI variation. Methods. A total of 142 undergraduate female students of the Department of Clinical Laboratory Sciences, Faculty of Applied Medical Sciences, King Khalid University, Abha, Saudi Arabia, were participated to investigate the possible linkage between genetic traits and BMI variations. Height and weight are collected from participants for BMI measurement. ABO blood type and Rh factor were determined by antisera. Results. Out of 142 female students, 48 were categorized in the first tertile (T1: less than 19.8 kg/m2), 50 were categorized in the second tertile (T2: between 19.8 and 23.7 kg/m2), and 44 were categorized in the third tertile (T3: greater than 23.7 kg/m2). Chi-square analysis shows that there were no associations of genetic traits including hair color, eye color, ABO blood type, and Rh blood type with BMI. However, a significant association between hair color and BMI was observed using multinomial logistic regression analysis. Conclusions. Our data provides a more robust prediction of the relative influence of genetic effects such as hair color on BMI. Future studies may contribute to identifying more association between genes involved in hair pigmentation and BMI variation.


PLoS ONE ◽  
2017 ◽  
Vol 12 (12) ◽  
pp. e0190238 ◽  
Author(s):  
Peter Frost ◽  
Karel Kleisner ◽  
Jaroslav Flegr
Keyword(s):  

1909 ◽  
Vol 18 (1) ◽  
pp. 50-65 ◽  
Author(s):  
S. J. HOLMES ◽  
H. M. LOOMIS
Keyword(s):  

2010 ◽  
Vol 86 (6) ◽  
pp. 904-917 ◽  
Author(s):  
Alessandro Biffi ◽  
Christopher D. Anderson ◽  
Michael A. Nalls ◽  
Rosanna Rahman ◽  
Akshata Sonni ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document