scholarly journals Evidence for Recent Polygenic Selection on Educational Attainment and Underlying Cognitive Abilities Inferred from GWAS Hits: A Monte Carlo Simulation Using Random SNPs

Author(s):  
Davide Piffer

Background: The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model. Methods: Average frequencies of alleles with positive effect (polygenic scores or PS) were compared across populations (N=26) using data from 1000 Genomes. A null model was created using frequencies of random SNPs. Results: Polygenic selection signal of educational attainment GWAS hits is high among a handful of SNPs within genomic regions replicated across GWAS publications. A polygenic score comprising 9 SNPs predicts population IQ (r=0.88), outperforming 99% of the polygenic scores obtained from sets of random SNPs (Monte Carlo p= 0.011). Its predictive power remains unaffected after controlling for spatial autocorrelation (Beta= 0.83). The largest polygenic score (161 SNPs) exhibits similar predictive power (Beta=0.8). Random polygenic scores are moderate predictors of population IQ (thanks to spatial autocorrelation), and their predictive power increases logarithmically with the number of SNPs, indicating an exponential reduction in noise. Conclusion: This study provides guidance for using GWAS hits together with random SNPs for testing polygenic selection using Monte Carlo simulations.

Author(s):  
Davide Piffer

Background: The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model.ethods: Average frequencies of alleles with positive effect (polygenic scores or PS) were compared across populations (N=26) using data from 1000 Genomes. A null model was created using frequencies of random SNPs.Results: Polygenic selection signal of educational attainment GWAS hits is high among a handful of SNPs within genomic regions replicated across GWAS publications. A polygenic score comprising 9 SNPs predicts population IQ (r=0.9), outperforming 99.9% of the polygenic scores obtained from sets of random SNPs. Its predictive power remains unaffected after controlling for spatial autocorrelation. Even random polygenic scores are moderate predictors of population IQ (thanks to spatial autocorrelation), and their predictive power increases logarithmically with the number of SNPs, indicating an exponential reduction in noise. Conclusion: This study provides guidance for using GWAS hits together with random SNPs for testing polygenic selection.


Psych ◽  
2019 ◽  
Vol 1 (1) ◽  
pp. 55-75 ◽  
Author(s):  
Davide Piffer

Genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment (EA) were used to test a polygenic selection model. Weighted and unweighted polygenic scores (PGS) were calculated and compared across populations using data from the 1000 Genomes (n = 26), HGDP-CEPH (n = 52) and gnomAD (n = 8) datasets. The PGS from the largest EA GWAS was highly correlated to two previously published PGSs (r = 0.96–0.97, N = 26). These factors are both highly predictive of average population IQ (r = 0.9, N = 23) and Learning index (r = 0.8, N = 22) and are robust to tests of spatial autocorrelation. Monte Carlo simulations yielded highly significant p values. In the gnomAD samples, the correlation between PGS and IQ was almost perfect (r = 0.98, N = 8), and ANOVA showed significant population differences in allele frequencies with positive effect. Socioeconomic variables slightly improved the prediction accuracy of the model (from 78–80% to 85–89%), but the PGS explained twice as much of the variance in IQ compared to socioeconomic variables. In both 1000 Genomes and gnomAD, there was a weak trend for lower GWAS significance SNPs to be less predictive of population IQ. Additionally, a subset of SNPs were found in the HGDP-CEPH sample (N = 127). The analysis of this sample yielded a positive correlation with latitude and a low negative correlation with distance from East Africa. This study provides robust results after accounting for spatial autocorrelation with Fst distances and random noise via an empirical Monte Carlo simulation using null SNPs.


2018 ◽  
Author(s):  
Davide Piffer

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment and the largest intelligence GWAS were used to test a polygenic selection model.Weighted and unweighted polygenic scores (PGS) were calculated and compared across populations (N=26) using data from the 1000 Genomes and HGDP-CEPH datasets. A set of 9 SNPs within genomic regions replicated across GWAS publications and a polygenic score calculated from the largest GWAS of educational attainment to date are highly correlated to a previously published factor (r= 0.96). These factors are both highly predictive of average population IQ (r=0.9), and are robust to tests of spatial autocorrelation. Monte Carlo simulations yielded highly significant p values. A subset of SNPs were found in the HGDP-CEPH sample (N= 127). The analysis of this sample yielded a positive correlation with latitude and a low negative correlation with distance from East Africa.This study provides robust results after accounting for spatial autocorrelation with Fst distances and random noise via an empirical Monte Carlo simulation using null SNPs and shows robust reproducibility of results from a previous study.


Author(s):  
Davide Piffer

Background: The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment and the largest intelligence GWAS were used to test a polygenic selection model. Methods: Average frequencies of alleles with positive effect (polygenic scores or PS) were compared across populations (N=26) using data from 1000 Genomes. Factor analysis was used to extract a signal of polygenic selection. Results: A polygenic selection factor of educational attainment GWAS hits is high among a handful of SNPs within genomic regions replicated across GWAS publications and it is highly correlated to the genetic intelligence factor (r= 0.96). These factors are both highly predictive of average population IQ (r=0.9), and are robust to tests of spatial autocorrelation. Several Monte Carlo simulations yielded highly significant p values. Furthermore, the polygenic selection model shows high replicability, with the EA and intelligence factor scores being virtually identical to those from an older study (r=0.96-0.99). A larger sample of populations (N=53) produced similar results. Conclusion: This study shows robust results after accounting for spatial autocorrelation and Monte Carlo simulation using random SNPs and shows robust reproducibility of results from a previous study.


Psych ◽  
2019 ◽  
Vol 1 (1) ◽  
pp. 55-75 ◽  
Author(s):  
Davide Piffer

Genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment (EA) were used to test a polygenic selection model. Weighted and unweighted polygenic scores (PGS) were calculated and compared across populations using data from the 1000 Genomes (n = 26), HGDP-CEPH (n = 52) and gnomAD (n = 8) datasets. The PGS from the largest EA GWAS was highly correlated to two previously published PGSs (r = 0.96–0.97, N = 26). These factors are both highly predictive of average population IQ (r = 0.9, N = 23) and Learning index (r = 0.8, N = 22) and are robust to tests of spatial autocorrelation. Monte Carlo simulations yielded highly significant p values. In the gnomAD samples, the correlation between PGS and IQ was almost perfect (r = 0.98, N = 8), and ANOVA showed significant population differences in allele frequencies with positive effect. Socioeconomic variables slightly improved the prediction accuracy of the model (from 78–80% to 85–89%), but the PGS explained twice as much of the variance in IQ compared to socioeconomic variables. In both 1000 Genomes and gnomAD, there was a weak trend for lower GWAS significance SNPs to be less predictive of population IQ. Additionally, a subset of SNPs were found in the HGDP-CEPH sample (N = 127). The analysis of this sample yielded a positive correlation with latitude and a low negative correlation with distance from East Africa. This study provides robust results after accounting for spatial autocorrelation with Fst distances and random noise via an empirical Monte Carlo simulation using null SNPs.


Author(s):  
Davide Piffer

The majority of polygenic selection signal of educational attainment GWAS hits is confined to a handful of SNPs within genomic regions replicated across GWAS publications. A polygenic score comprising 9 SNPs predicts population IQ (r=0.9), outperforming 99.9% of the polygenic scores obtained from sets of random SNPs. Its predictive power remains unaffected after controlling for spatial autocorrelation. Even random polygenic scores are moderate predictors of population IQ, and their predictive power increases logarithmically with the number of SNPs, indicating an exponential reduction in noise.Thus, the predictive power of polygenic scores has to be scaled in proportion to the number of SNPs composing them.


2018 ◽  
Author(s):  
A.G. Allegrini ◽  
S. Selzam ◽  
K. Rimfeld ◽  
S. von Stumm ◽  
J.B. Pingault ◽  
...  

AbstractRecent advances in genomics are producing powerful DNA predictors of complex traits, especially cognitive abilities. Here, we leveraged summary statistics from the most recent genome-wide association studies of intelligence and educational attainment to build prediction models of general cognitive ability and educational achievement. To this end, we compared the performances of multi-trait genomic and polygenic scoring methods. In a representative UK sample of 7,026 children at age 12 and 16, we show that we can now predict up to 11 percent of the variance in intelligence and 16 percent in educational achievement. We also show that predictive power increases from age 12 to age 16 and that genomic predictions do not differ for girls and boys. Multivariate genomic methods were effective in boosting predictive power and, even though prediction accuracy varied across polygenic scores approaches, results were similar using different multivariate and polygenic score methods. Polygenic scores for educational attainment and intelligence are the most powerful predictors in the behavioural sciences and exceed predictions that can be made from parental phenotypes such as educational attainment and occupational status.


Author(s):  
Davide Piffer

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model. Average frequencies of alleles with positive effect (polygenic scores or PS) were compared across populations (N=26) using data from 1000 Genomes. The PS of 161 GWAS significant SNPs in a recent meta-analysis was highly correlated to population IQ (r=0.863) and to the polygenic score of four alleles independently associated with general cognitive ability. High  correlations with PISA scores for a subsample were observed.SNP p value predicted correlation to population IQ and factors from the two previous GWAS (r= -.25). Factor analysis produced similar estimates of selection pressure for educational attainment across the three datasets. Polygenic and factor scores computed using the top 20 significant SNPs showed very high correlation to population IQ (r=0.88; 0.9). Similar findings were obtained using 52 populations from another database (ALFRED). The results together constitute a replication of preliminary findings and provide strong evidence for recent diversifying polygenic selection on educational attainment and underlying cognitive ability.


2021 ◽  
Author(s):  
Hans van Kippersluis ◽  
Pietro Biroli ◽  
Titus J. Galama ◽  
Stephanie von Hinke ◽  
S. Fleur W. Meddens ◽  
...  

Polygenic scores have become the workhorse for empirical analyses in social-science genetics. Because a polygenic score is constructed using the results of finite-sample Genome-Wide Association Studies (GWASs), it is a noisy approximation of the true latent genetic predisposition to a certain trait. The conventional way of boosting the predictive power of polygenic scores is to increase the GWAS sample size by meta-analyzing GWAS results of multiple cohorts. In this paper we challenge this convention. Through simulations, we show that Instrumental Variable (IV) regression using two polygenic scores from independent GWAS samples outperforms the typical Ordinary Least Squares (OLS) model employing a single meta-analysis based polygenic score in terms of bias, root mean squared error, and statistical power. We verify the empirical validity of these simulations by predicting educational attainment (EA) and height in a sample of siblings from the UK Biobank. We show that IV regression between-families approaches the SNP-based heritabilities, while compared to meta-analysis applying IV regression within-families provides a tighter lower bound on the direct genetic effect. IV estimation improves the predictive power of polygenic scores by 12% (height) to 22% (EA). Our findings suggest that measurement error is a key explanation for hidden heritability (i.e., the difference between SNP-based and GWAS-based heritability), and that it can be overcome using IV regression. We derive the practical rule of thumb that IV outperforms OLS when the correlation between the two polygenic scores used in IV regression is larger than √(10 / (N+10)), with N the sample size of the prediction sample.


Sign in / Sign up

Export Citation Format

Share Document