Incorporating polygenic scores in the twin model to estimate genotype-environment covariance: exploration of statistical power

2019 ◽  
Author(s):  
Conor V. Dolan ◽  
Roel C. A. Huijskens ◽  
Camelia C. Minică ◽  
Michael C. Neale ◽  
Dorret I. Boomsma

AbstractThe assumption in the twin model that genotypic and environmental variables are uncorrelated is primarily made to ensure parameter identification, not because researchers necessarily think that these variables are uncorrelated. Although the biasing effects of such correlations are well understood, it would be useful to be able to estimate these parameters in the twin model. Here we consider the possibility of relaxing this assumption by adding polygenic score to the (univariate) twin model. We demonstrated numerically and analytically this extension renders the additive genetic (A) – unshared environmental correlation (E) and the additive genetic (A) - shared environmental (C) correlations simultaneously identified. We studied the statistical power to detect A-C and A-E correlations in the ACE model, and to detect A-E correlation in the AE model. The results showed that the power to detect these covariance terms, given 1000 MZ and 1000 DZ twin pairs (α=0.05), depends greatly on the parameter settings of the model. We show fixing the estimated percentage of variance in the outcome trait that is due to the polygenic scores greatly increases statistical power.

2021 ◽  
Author(s):  
Conor V. Dolan ◽  
Roel C. A. Huijskens ◽  
Camelia C. Minică ◽  
Michael C. Neale ◽  
Dorret I. Boomsma

AbstractThe assumption in the twin model that genotypic and environmental variables are uncorrelated is primarily made to ensure parameter identification, not because researchers necessarily think that these variables are uncorrelated. Although the biasing effects of such correlations are well understood, a method to estimate these parameters in the twin model would be useful. Here we explore the possibility of relaxing this assumption by adding polygenic scores to the (univariate) twin model. We demonstrate that this extension renders the additive genetic (A)—common environmental (C) covariance (σAC) identified. We study the statistical power to reject σAC = 0 in the ACE model and present the results of simulations.


2017 ◽  
Author(s):  
Camelia C. Minică ◽  
Conor V. Dolan ◽  
Dorret I. Boomsma ◽  
Eco de Geus ◽  
Michael C. Neale

ABSTRACTMendelian Randomization (MR) is an important approach to modelling causality in non-experimental settings. MR uses genetic instruments to test causal relationships between exposures and outcomes of interest. Individual genetic variants have small effects, and so, when used as instruments, render MR liable to weak instrument bias. Polygenic scores have the advantage of larger effects, but may be characterized by direct pleiotropy, which violates a central assumption of MR.We developed the MR-DoC twin model by integrating MR with the Direction of Causation twin model. This model allows us to test pleiotropy directly. We considered the issue of parameter identification, and given identification, we conducted extensive power calculations. MR-DoC allows one to test causal hypotheses and to obtain unbiased estimates of the causal effect given pleiotropic instruments (polygenic scores), while controlling for genetic and environmental influences common to the outcome and exposure. Furthermore, MR-DoC in twins has appreciably greater statistical power than a standard MR analysis applied to singletons, if the unshared environmental effects on the exposure and the outcome are uncorrelated. Generally, power increases with: 1) decreasing residual exposure-outcome correlation, and 2) decreasing heritability of the exposure variable.MR-DoC allows one to employ strong instrumental variables (polygenic scores, possibly pleiotropic), guarding against weak instrument bias and increasing the power to detect causal effects. Our approach will enhance and extend MR’s range of applications, and increase the value of the large cohorts collected at twin registries as they correctly detect causation and estimate effect sizes even in the presence of pleiotropy.


2021 ◽  
Author(s):  
Hans van Kippersluis ◽  
Pietro Biroli ◽  
Titus J. Galama ◽  
Stephanie von Hinke ◽  
S. Fleur W. Meddens ◽  
...  

Polygenic scores have become the workhorse for empirical analyses in social-science genetics. Because a polygenic score is constructed using the results of finite-sample Genome-Wide Association Studies (GWASs), it is a noisy approximation of the true latent genetic predisposition to a certain trait. The conventional way of boosting the predictive power of polygenic scores is to increase the GWAS sample size by meta-analyzing GWAS results of multiple cohorts. In this paper we challenge this convention. Through simulations, we show that Instrumental Variable (IV) regression using two polygenic scores from independent GWAS samples outperforms the typical Ordinary Least Squares (OLS) model employing a single meta-analysis based polygenic score in terms of bias, root mean squared error, and statistical power. We verify the empirical validity of these simulations by predicting educational attainment (EA) and height in a sample of siblings from the UK Biobank. We show that IV regression between-families approaches the SNP-based heritabilities, while compared to meta-analysis applying IV regression within-families provides a tighter lower bound on the direct genetic effect. IV estimation improves the predictive power of polygenic scores by 12% (height) to 22% (EA). Our findings suggest that measurement error is a key explanation for hidden heritability (i.e., the difference between SNP-based and GWAS-based heritability), and that it can be overcome using IV regression. We derive the practical rule of thumb that IV outperforms OLS when the correlation between the two polygenic scores used in IV regression is larger than √(10 / (N+10)), with N the sample size of the prediction sample.


2017 ◽  
Author(s):  
Amit V. Khera ◽  
Mark Chaffin ◽  
Krishna G. Aragam ◽  
Connor A. Emdin ◽  
Derek Klarin ◽  
...  

AbstractIdentification of individuals at increased genetic risk for a complex disorder such as coronary disease can facilitate treatments or enhanced screening strategies. A rare monogenic mutation associated with increased cholesterol is present in ~1:250 carriers and confers an up to 4-fold increase in coronary risk when compared with non-carriers. Although individual common polymorphisms have modest predictive capacity, their cumulative impact can be aggregated into a polygenic score. Here, we develop a new, genome-wide polygenic score that aggregates information from 6.6 million common polymorphisms and show that this score can similarly identify individuals with a 4-fold increased risk for coronary disease. In >400,000 participants from UK Biobank, the score conforms to a normal distribution and those in the top 2.5% of the distribution are at 4-fold increased risk compared to the remaining 97.5%. Similar patterns are observed with genome-wide polygenic scores for two additional diseases – breast cancer and severe obesity.One Sentence SummaryA genome-wide polygenic score identifies 2.5% of the population born with a 4-fold increased risk for coronary artery disease.


2019 ◽  
Author(s):  
Saskia Selzam ◽  
Stuart J. Ritchie ◽  
Jean-Baptiste Pingault ◽  
Chandra A. Reynolds ◽  
Paul F. O’Reilly ◽  
...  

AbstractPolygenic scores are a popular tool for prediction of complex traits. However, prediction estimates in samples of unrelated participants can include effects of population stratification, assortative mating and environmentally mediated parental genetic effects, a form of genotype-environment correlation (rGE). Comparing genome-wide polygenic score (GPS) predictions in unrelated individuals with predictions between siblings in a within-family design is a powerful approach to identify these different sources of prediction. Here, we compared within- to between-family GPS predictions of eight life outcomes (anthropometric, cognitive, personality and health) for eight corresponding GPSs. The outcomes were assessed in up to 2,366 dizygotic (DZ) twin pairs from the Twins Early Development Study from age 12 to age 21. To account for family clustering, we used mixed-effects modelling, simultaneously estimating within- and between-family effects for target- and cross-trait GPS prediction of the outcomes. There were three main findings: (1) DZ twin GPS differences predicted DZ differences in height, BMI, intelligence, educational achievement and ADHD symptoms; (2) target and cross-trait analyses indicated that GPS prediction estimates for cognitive traits (intelligence and educational achievement) were on average 60% greater between families than within families, but this was not the case for non-cognitive traits; and (3) this within- and between-family difference for cognitive traits disappeared after controlling for family socio-economic status (SES), suggesting that SES is a source of between-family prediction through rGE mechanisms. These results provide novel insights into the patterns by which rGE contributes to GPS prediction, while ruling out confounding due to population stratification and assortative mating.


2020 ◽  
Author(s):  
Yongkang Kim ◽  
Jared V. Balbona ◽  
Matthew C. Keller

AbstractIn a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing $$r^2$$ r 2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation ($$r^2=.05$$ r 2 = . 05 ), standard errors of these standardized estimates are reasonable ($$< .05$$ < . 05 ) for $$n=16K$$ n = 16 K trios, and can even be reasonable for smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., $$r^2>.025)$$ r 2 > . 025 ) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.


Author(s):  
Jessica Dennis ◽  
Julia Sealock ◽  
Rebecca T. Levinson ◽  
Eric Farber-Eger ◽  
Jacob Franco ◽  
...  

AbstractMajor depressive disorder (MDD) and loneliness are phenotypically and genetically correlated with coronary artery disease (CAD), but whether these associations are explained by pleiotropic genetic variants or shared comorbidities is unclear. To tease apart these scenarios, we first assessed the medical morbidity pattern associated with genetic risk factors for MDD and loneliness by conducting a phenome-wide association study in 18,385 European-ancestry individuals in the Vanderbilt University Medical Center biobank, BioVU. Polygenic scores for MDD and loneliness were developed for each person using previously published meta-GWAS summary statistics, and were tested for association with 882 clinical diagnoses ascertained via billing codes in electronic health records. We discovered strong associations with heart disease diagnoses, and next embarked on targeted analyses of CAD in 3893 cases and 4197 controls. We found odds ratios of 1.11 (95% CI, 1.04–1.18; P 8.43 × 10−4) and 1.13 (95% CI, 1.07–1.20; P 4.51 × 10−6) per 1-SD increase in the polygenic scores for MDD and loneliness, respectively. Results were similar in patients without psychiatric symptoms, and the increased risk persisted in females even after adjusting for multiple conventional risk factors and a polygenic score for CAD. In a final sensitivity analysis, we statistically adjusted for the genetic correlation between MDD and loneliness and re-computed polygenic scores. The polygenic score unique to loneliness remained associated with CAD (OR 1.09, 95% CI 1.03–1.15; P 0.002), while the polygenic score unique to MDD did not (OR 1.00, 95% CI 0.95–1.06; P 0.97). Our replication sample was the Atherosclerosis Risk in Communities (ARIC) cohort of 7197 European-ancestry participants (1598 incident CAD cases). In ARIC, polygenic scores for MDD and loneliness were associated with hazard ratios of 1.07 (95% CI, 0.99–1.14; P = 0.07) and 1.07 (1.01–1.15; P = 0.03), respectively, and we replicated findings from the BioVU sensitivity analyses. We conclude that genetic risk factors for MDD and loneliness act pleiotropically to increase CAD risk in females.


2019 ◽  
Author(s):  
Jessica Dennis ◽  
Julia Sealock ◽  
Rebecca T Levinson ◽  
Eric Farber-Eger ◽  
Jacob Franco ◽  
...  

AbstractImportanceEpidemiological evidence indicates that major depressive disorder (MDD) and loneliness both reduce life expectancies, but mechanisms underlying the excess morbidity are unclear. Electronic health records (EHRs) linked to genetic data offer new opportunities to address this knowledge gap.ObjectiveTo determine the medical morbidity pattern associated with genetic risk factors for MDD and loneliness, two common psychological traits with adverse health outcomes.DesignPhenome-wide association study using EHRs spanning 1990 to 2017 from the Vanderbilt University Medical Center biobank, BioVU. Top associations with coronary artery disease (CAD) were replicated in the Atherosclerosis Risk in Communities (ARIC) cohort.SettingHospital-based EHR study, with replication in a population-based cohort study.Participants18,385 genotyped adult patients in BioVU. Replication in ARIC included 7,197 genotyped participants. All participants were of European ancestry.ExposuresPolygenic scores for MDD and loneliness were developed for each individual using previously published meta-GWAS summary statistics.Main Outcomes and MeasuresThe phenome-wide association study included 882 clinical diagnoses ascertained via billing codes in the EHR. ARIC included 1598 incident CAD cases.ResultsBioVU patients had a median EHR length of 9.91 years. In the phenome-wide association study, polygenic scores for MDD and loneliness were significantly associated with psychiatric and cardiac phenotypes. Targeted analyses of CAD in 3,893 cases and 4,197 controls in BioVU found odds ratios of 1.11 (95% CI, 1.04-1.18; P=8.43×10−4) and 1.13 (95% CI, 1.07-1.20; P=4.51×10−6) per 1-SD increase in the polygenic scores for MDD and loneliness, respectively. Comparable hazard ratios in ARIC were 1.07 (95% CI, 0.99-1.14; P=0.07) and 1.07 (1.01-1.15; P=0.03). Across both studies, the increased risk persisted in women after adjusting for multiple conventional risk factors, a polygenic score for CAD, and psychiatric symptoms (available in BioVU). Controlling for genetic risk factors shared between MDD and loneliness, the polygenic score for loneliness conditioned on MDD remained associated with CAD risk, but the polygenic score for MDD conditioned on loneliness did not.Conclusions and RelevanceGenetic risk factors for MDD and loneliness act pleiotropically to increase CAD risk in women. Continued research into the biological and clinical connections between the heart and mind is warranted.


2016 ◽  
Author(s):  
Benjamin W. Domingue ◽  
Hexuan Liu ◽  
Aysu Okbay ◽  
Daniel W. Belsky

AbstractExperience of stressful life events is associated with risk of depression. Yet many exposed individuals do not become depressed. A controversial hypothesis is that genetic factors influence vulnerability to depression following stress. This hypothesis is most commonly tested with a “diathesis-stress” model, in which genes confer excess vulnerability. We tested an alternative model, in which genes may buffer against the depressogenic effects of life stress. We measured the hypothesized genetic buffer using a polygenic score derived from a published genome-wide association study (GWAS) of subjective wellbeing. We tested if married older adults who had higher polygenic scores were less vulnerable to depressive symptoms following the death of their spouse as compared to age-peers who had also lost their spouse and who had lower polygenic scores. We analyzed data from N=9,453 non-Hispanic white adults in the Health and Retirement Study (HRS), a population-representative longitudinal study of older adults in the United States. HRS adults with higher wellbeing polygenic scores experienced fewer depressive symptoms during follow-up. Those who survived death of their spouses during follow-up (n=1,829) experienced a sharp increase in depressive symptoms following the death and returned toward baseline over the following two years. Having a higher polygenic score buffered against increased depressive symptoms following a spouse's death. Effects were small and clinical relevance is uncertain, although polygenic score analyses may provide clues to behavioral pathways that can serve as therapeutic targets. Future studies of gene-environment interplay in depression may benefit from focus on genetics discovered for putative protective factors.


2021 ◽  
Author(s):  
Maryn O. Carlson ◽  
Daniel P. Rice ◽  
Jeremy J. Berg ◽  
Matthias Steinrücken

AbstractPolygenic scores link the genotypes of ancient individuals to their phenotypes, which are often unobservable, offering a tantalizing opportunity to reconstruct complex trait evolution. In practice, however, interpretation of ancient polygenic scores is subject to numerous assumptions. For one, the genome-wide association (GWA) studies from which polygenic scores are derived, can only estimate effect sizes for loci segregating in contemporary populations. Therefore, a GWA study may not correctly identify all loci relevant to trait variation in the ancient population. In addition, the frequencies of trait-associated loci may have changed in the intervening years. Here, we devise a theoretical framework to quantify the effect of this allelic turnover on the statistical properties of polygenic scores as functions of population genetic dynamics, trait architecture, power to detect significant loci, and the age of the ancient sample. We model the allele frequencies of loci underlying trait variation using the Wright-Fisher diffusion, and employ the spectral representation of its transition density to find analytical expressions for several error metrics, including the correlation between an ancient individual’s polygenic score and true phenotype, referred to as polygenic score accuracy. Our theory also applies to a two-population scenario and demonstrates that allelic turnover alone may explain a substantial percentage of the reduced accuracy observed in cross-population predictions, akin to those performed in human genetics. Finally, we use simulations to explore the effects of recent directional selection, a bias-inducing process, on the statistics of interest. We find that even in the presence of bias, weak selection induces minimal deviations from our neutral expectations for the decay of polygenic score accuracy. By quantifying the limitations of polygenic scores in an explicit evolutionary context, our work lays the foundation for the development of more sophisticated statistical procedures to analyze both temporally and geographically resolved polygenic scores.


Sign in / Sign up

Export Citation Format

Share Document