Bias and Precision of Parameter Estimates from Models Using Polygenic Scores to Estimate Environmental and Genetic Parental Influences

AbstractIn a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing $$r^2$$ r 2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation ($$r^2=.05$$ r 2 = . 05 ), standard errors of these standardized estimates are reasonable ($$< .05$$ < . 05 ) for $$n=16K$$ n = 16 K trios, and can even be reasonable for smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., $$r^2>.025)$$ r 2 > . 025 ) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.

Download Full-text

Bias and precision of parameter estimates from models using polygenic scores to estimate environmental and genetic parental influences

10.1101/2020.08.11.246827 ◽

2020 ◽

Author(s):

Yongkang Kim ◽

Jared V. Balbona ◽

Matthew C. Keller

Keyword(s):

Assortative Mating ◽

Vertical Transmission ◽

Companion Paper ◽

Genetic Effects ◽

Causal Models ◽

Parental Influences ◽

Standard Errors ◽

Parameter Estimates ◽

Polygenic Score ◽

Polygenic Scores

AbstractIn a companion paper (Balbona et al. (2020)), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing r2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation (r2 = .05), standard errors of these standardized estimates were reasonable (< .05) for n = 16K trios, and smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., r2 > .025) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.

Download Full-text

Comparing within- and between-family polygenic score prediction

10.1101/605006 ◽

2019 ◽

Cited By ~ 4

Author(s):

Saskia Selzam ◽

Stuart J. Ritchie ◽

Jean-Baptiste Pingault ◽

Chandra A. Reynolds ◽

Paul F. O’Reilly ◽

...

Keyword(s):

Assortative Mating ◽

Complex Traits ◽

Population Stratification ◽

Educational Achievement ◽

Economic Status ◽

Life Outcomes ◽

Polygenic Score ◽

Family Effects ◽

Polygenic Scores ◽

Cognitive Traits

AbstractPolygenic scores are a popular tool for prediction of complex traits. However, prediction estimates in samples of unrelated participants can include effects of population stratification, assortative mating and environmentally mediated parental genetic effects, a form of genotype-environment correlation (rGE). Comparing genome-wide polygenic score (GPS) predictions in unrelated individuals with predictions between siblings in a within-family design is a powerful approach to identify these different sources of prediction. Here, we compared within- to between-family GPS predictions of eight life outcomes (anthropometric, cognitive, personality and health) for eight corresponding GPSs. The outcomes were assessed in up to 2,366 dizygotic (DZ) twin pairs from the Twins Early Development Study from age 12 to age 21. To account for family clustering, we used mixed-effects modelling, simultaneously estimating within- and between-family effects for target- and cross-trait GPS prediction of the outcomes. There were three main findings: (1) DZ twin GPS differences predicted DZ differences in height, BMI, intelligence, educational achievement and ADHD symptoms; (2) target and cross-trait analyses indicated that GPS prediction estimates for cognitive traits (intelligence and educational achievement) were on average 60% greater between families than within families, but this was not the case for non-cognitive traits; and (3) this within- and between-family difference for cognitive traits disappeared after controlling for family socio-economic status (SES), suggesting that SES is a source of between-family prediction through rGE mechanisms. These results provide novel insights into the patterns by which rGE contributes to GPS prediction, while ruling out confounding due to population stratification and assortative mating.

Download Full-text

Parental influences on offspring education: indirect genetic effects of non-cognitive skills

10.1101/2020.09.15.296236 ◽

2020 ◽

Cited By ~ 1

Author(s):

Perline A. Demange ◽

Jouke Jan Hottenga ◽

Abdel Abdellaoui ◽

Margherita Malanchini ◽

Benjamin W. Domingue ◽

...

Keyword(s):

Home Environment ◽

Cognitive Skills ◽

Educational Achievement ◽

Genetic Effects ◽

Parental Influences ◽

Mediated Effects ◽

Educational Trajectories ◽

Indirect Genetic Effects ◽

Polygenic Scores ◽

Research Goal

AbstractUnderstanding how parents shape their children’s educational trajectories is a socially important research goal. Evidence on the effects of parents’ cognitive and non-cognitive skills on offspring education is weakened by poor assessments of non-cognitive skills and inadequate accounting for genetic inheritance. In this preregistered study, we use genetics to assess non-cognitive skills and to index environmental effects of parents, controlling for direct effects of inherited genetic variation. We define the non-cognitive and cognitive heritable contributions to educational attainment using GWAS-by-subtraction, and construct non-cognitive and cognitive skills polygenic scores in three UK and Dutch cohorts. We estimate environmentally mediated effects of polygenic scores (parental indirect genetic effects) on educational achievement and attainment with three designs that include siblings (N=47,459), adoptees (N=6,407), and parent-offspring trios (N=2,534). Heritable non-cognitive and cognitive skills are both involved in parental construction of environments influencing offspring education: indirect genetic effects explain ∼37% of total polygenic score effects. This result holds across countries, outcomes, ages and methods, with two exceptions: indirect genetic effects are null for childhood achievement in the Dutch cohort, and lower when estimated with the adoption method. Overall, our findings stress the importance of both non-cognitive and cognitive aspects of the home environment.

Download Full-text

The Four-States Model of Memory Retrieval Experiences

Zeitschrift für Psychologie / Journal of Psychology ◽

10.1027/0044-3409.215.1.61 ◽

2007 ◽

Vol 215 (1) ◽

pp. 61-71 ◽

Cited By ~ 5

Author(s):

Edgar Erdfelder ◽

Lutz Cüpper ◽

Tina-Sarah Auer ◽

Monika Undorf

Keyword(s):

Signal Detection ◽

Memory Retrieval ◽

Measurement Model ◽

Standard Errors ◽

Parameter Estimates ◽

Unequal Variances ◽

One Dimensional ◽

Detection Model ◽

Memory States ◽

The One

Abstract. A memory measurement model is presented that accounts for judgments of remembering, knowing, and guessing in old-new recognition tasks by assuming four disjoint latent memory states: recollection, familiarity, uncertainty, and rejection. This four-states model can be applied to both Tulving's (1985) remember-know procedure (RK version) and Gardiner and coworker's ( Gardiner, Java, & Richardson-Klavehn, 1996 ; Gardiner, Richardson-Klavehn, & Ramponi, 1997 ) remember-know-guess procedure (RKG version). It is shown that the RK version of the model fits remember-know data approximately as well as the one-dimensional signal detection model does. In contrast, the RKG version of the four-states model outperforms the corresponding detection model even if unequal variances for old and new items are allowed for.We show empirically that the two versions of the four-statesmodelmeasure the same state probabilities. However, the RKG version, requiring remember-know-guess judgments, provides parameter estimates with smaller standard errors and is therefore recommended for routine use.

Download Full-text

Genome-wide polygenic score to identify a monogenic risk-equivalent for coronary disease

10.1101/218388 ◽

2017 ◽

Cited By ~ 9

Author(s):

Amit V. Khera ◽

Mark Chaffin ◽

Krishna G. Aragam ◽

Connor A. Emdin ◽

Derek Klarin ◽

...

Keyword(s):

Coronary Disease ◽

Fold Increase ◽

Predictive Capacity ◽

Cumulative Impact ◽

Polygenic Score ◽

Genome Wide ◽

A Genome ◽

Increased Risk ◽

Polygenic Scores ◽

Artery Disease

AbstractIdentification of individuals at increased genetic risk for a complex disorder such as coronary disease can facilitate treatments or enhanced screening strategies. A rare monogenic mutation associated with increased cholesterol is present in ~1:250 carriers and confers an up to 4-fold increase in coronary risk when compared with non-carriers. Although individual common polymorphisms have modest predictive capacity, their cumulative impact can be aggregated into a polygenic score. Here, we develop a new, genome-wide polygenic score that aggregates information from 6.6 million common polymorphisms and show that this score can similarly identify individuals with a 4-fold increased risk for coronary disease. In >400,000 participants from UK Biobank, the score conforms to a normal distribution and those in the top 2.5% of the distribution are at 4-fold increased risk compared to the remaining 97.5%. Similar patterns are observed with genome-wide polygenic scores for two additional diseases – breast cancer and severe obesity.One Sentence SummaryA genome-wide polygenic score identifies 2.5% of the population born with a 4-fold increased risk for coronary artery disease.

Download Full-text

Some Results on the Behavior of Alternate Covariance Structure Estimation Procedures in the Presence of Non-Normal Data

Journal of Marketing Research ◽

10.1177/002224378902600207 ◽

1989 ◽

Vol 26 (2) ◽

pp. 214-221 ◽

Cited By ~ 47

Author(s):

Subhash Sharma ◽

Srinivas Durvasula ◽

William R. Dillon

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Covariance Structure ◽

Standard Errors ◽

Parameter Estimates ◽

Simulation Experiments ◽

Estimation Techniques ◽

Structure Estimation ◽

Estimation Procedures ◽

Simulation Results

The authors report some results on the behavior of alternative covariance structure estimation procedures in the presence of non-normal data. They conducted Monté Carlo simulation experiments with a factorial design involving three levels of skewness, three level of kurtosis, and three different sample sizes. For normal data, among all the elliptical estimation techniques, elliptical reweighted least squares (ERLS) was equivalent in performance to ML. However, as expected, for non-normal data parameter estimates were unbiased for ML and the elliptical estimation techniques, whereas the bias in standard errors was substantial for GLS and ML. Among elliptical estimation techniques, ERLS was superior in performance. On the basis of the simulation results, the authors recommend that researchers use ERLS for both normal and non-normal data.

Download Full-text

Analysis of algebraic weighted least-squares estimators for enzyme parameters

Biochemical Journal ◽

10.1042/bj2880533 ◽

1992 ◽

Vol 288 (2) ◽

pp. 533-538 ◽

Cited By ~ 3

Author(s):

M E Jones

Keyword(s):

Least Squares ◽

Coefficient Of Variation ◽

Constant Coefficient ◽

Weighted Least Squares ◽

Simulated Data ◽

Standard Errors ◽

Parameter Estimates ◽

Spreadsheet Program ◽

Least Squares Estimators ◽

Constant Coefficient Of Variation

An algorithm for the least-squares estimation of enzyme parameters Km and Vmax. is proposed and its performance analysed. The problem is non-linear, but the algorithm is algebraic and does not require initial parameter estimates. On a spreadsheet program such as MINITAB, it may be coded in as few as ten instructions. The algorithm derives an intermediate estimate of Km and Vmax. appropriate to data with a constant coefficient of variation and then applies a single reweighting. Its performance using simulated data with a variety of error structures is compared with that of the classical reciprocal transforms and to both appropriately and inappropriately weighted direct least-squares estimators. Three approaches to estimating the standard errors of the parameter estimates are discussed, and one suitable for spreadsheet implementation is illustrated.

Download Full-text

Approach to analysing correlated contextual factors: an application for studies on violence

Injury Prevention ◽

10.1136/injuryprev-2020-043967 ◽

2021 ◽

pp. injuryprev-2020-043967

Author(s):

Marizen R Ramirez ◽

Javier E Flores ◽

Gang Cheng ◽

Corinne Peek-Asa ◽

Joseph E Cavanaugh

Keyword(s):

Academic Performance ◽

School Violence ◽

Principal Components ◽

Contextual Factors ◽

Epidemiological Studies ◽

Standard Errors ◽

Parameter Estimates ◽

School Crime ◽

School Factors ◽

Health Studies

BackgroundNumerous public health studies, especially in the area of violence, examine the effects of contextual or group-level factors on health outcomes. Often, these contextual factors exhibit strong pairwise correlations, which pose a challenge when these factors are included as covariates in a statistical model. Such models may be characterised by inflated standard errors and unstable parameter estimates that may fluctuate drastically from sample to sample, where the excessive estimation variability is reflected by inflated standard errors.MethodsWe propose a three-stage approach for analysing correlated contextual factors that proceeds as follows: (1) a principal components analysis (PCA) is performed on the original set of correlated variables, (2) the primary generated principal components are included in a multilevel multivariable model and (3) the estimated parameters for these components are transformed into estimates for each of the original contextual factors. Using school violence data, we examined the associations between school crime and correlated contextual school factors (ie, English proficiency, academic performance, pupil to teacher ratio, average class size and children on free and reduced meals).ResultsFrom models ignoring correlations, school crime was not reliably associated with any of the contextual school factors. When models were fit with principal components, school crime was found to be positively associated with a school’s student to teacher ratio, average classroom size and academic performance but negatively associated with the proportion of children who were on free and reduced meals.ConclusionOur multistep approach is one way to address multicollinearity encountered in social epidemiological studies of violence.

Download Full-text

Genetic risk for major depressive disorder and loneliness in sex-specific associations with coronary artery disease

Molecular Psychiatry ◽

10.1038/s41380-019-0614-y ◽

2019 ◽

Cited By ~ 3

Author(s):

Jessica Dennis ◽

Julia Sealock ◽

Rebecca T. Levinson ◽

Eric Farber-Eger ◽

Jacob Franco ◽

...

Keyword(s):

Risk Factors ◽

Coronary Artery Disease ◽

Major Depressive Disorder ◽

Coronary Artery ◽

Genetic Risk ◽

European Ancestry ◽

Major Depressive ◽

Polygenic Score ◽

Polygenic Scores ◽

Artery Disease

AbstractMajor depressive disorder (MDD) and loneliness are phenotypically and genetically correlated with coronary artery disease (CAD), but whether these associations are explained by pleiotropic genetic variants or shared comorbidities is unclear. To tease apart these scenarios, we first assessed the medical morbidity pattern associated with genetic risk factors for MDD and loneliness by conducting a phenome-wide association study in 18,385 European-ancestry individuals in the Vanderbilt University Medical Center biobank, BioVU. Polygenic scores for MDD and loneliness were developed for each person using previously published meta-GWAS summary statistics, and were tested for association with 882 clinical diagnoses ascertained via billing codes in electronic health records. We discovered strong associations with heart disease diagnoses, and next embarked on targeted analyses of CAD in 3893 cases and 4197 controls. We found odds ratios of 1.11 (95% CI, 1.04–1.18; P 8.43 × 10−4) and 1.13 (95% CI, 1.07–1.20; P 4.51 × 10−6) per 1-SD increase in the polygenic scores for MDD and loneliness, respectively. Results were similar in patients without psychiatric symptoms, and the increased risk persisted in females even after adjusting for multiple conventional risk factors and a polygenic score for CAD. In a final sensitivity analysis, we statistically adjusted for the genetic correlation between MDD and loneliness and re-computed polygenic scores. The polygenic score unique to loneliness remained associated with CAD (OR 1.09, 95% CI 1.03–1.15; P 0.002), while the polygenic score unique to MDD did not (OR 1.00, 95% CI 0.95–1.06; P 0.97). Our replication sample was the Atherosclerosis Risk in Communities (ARIC) cohort of 7197 European-ancestry participants (1598 incident CAD cases). In ARIC, polygenic scores for MDD and loneliness were associated with hazard ratios of 1.07 (95% CI, 0.99–1.14; P = 0.07) and 1.07 (1.01–1.15; P = 0.03), respectively, and we replicated findings from the BioVU sensitivity analyses. We conclude that genetic risk factors for MDD and loneliness act pleiotropically to increase CAD risk in females.

Download Full-text

Modeling Extended Twin Family Data II: Power Associated With Different Family Structures

Twin Research and Human Genetics ◽

10.1375/twin.12.1.19 ◽

2009 ◽

Vol 12 (1) ◽

pp. 19-25 ◽

Cited By ~ 13

Author(s):

Sarah E. Medland ◽

Matthew C. Keller

Keyword(s):

Assortative Mating ◽

Environmental Effects ◽

Cultural Transmission ◽

Statistical Power ◽

Large Data ◽

Genetic Effects ◽

Data Sets ◽

Family Structures ◽

Nonrandom Mating ◽

Children Of Twins

AbstractModeling the data from extended twin pedigrees allows the estimation of increasing complex covariance relationships in which the effects of cultural transmission, nonrandom mating and genotype x environment covariation can be incorporated. However, the power to detect these effects in existing data sets has not yet been examined. The present study examined the effects that different family structures (i.e., the ratio of MZ to DZ families and the importance of cousins vs. avuncular relatives) have on statistical power. In addition, we examined the power to detect genetic and environmental effects within the context of two large data sets (VA30K and the OZVA60K). We found that power to detect additive genetic and cultural transmission effects were maximized by over sampling MZ families. In terms of ascertainment, there was little difference in power between samples that had focused on recruiting a third generation (the children of twins) versus those that had focused on recruiting the siblings of the twins. In addition, we examined the power to detect additive and dominant genetic effects, cultural transmission and assortative mating in the existing VA30K and OZVA60K samples, under two different models of mating: phenotypic assortment and social homogamy. There was nearly 100% power to detect assortative mating and cultural transmission, against a background of small additive and dominant genetic and familial environmental effects. In addition, the power to detect additive or dominant genetic effects quickly asymptoted, so that there was almost 100% power to detect effects explaining 20% or more of the total variance. These results demonstrate that the Cascade model has sufficient power to detect parameters of interest in existing datasets. Mx scripts are available from www.vipbg.vcu.edu/~sarahme/cascade.

Download Full-text