Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data

Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.

Download Full-text

Genetic Instrumental Variable (GIV) regression: Explaining socioeconomic and health outcomes in non-experimental data

10.1101/134197 ◽

2017 ◽

Cited By ~ 2

Author(s):

Thomas A. DiPrete ◽

Casper A.P. Burik ◽

Philipp D. Koellinger

Keyword(s):

Experimental Data ◽

Fixed Effects ◽

Genome Wide Association Study ◽

Body Height ◽

Outcome Variable ◽

Endogeneity Bias ◽

Multiple Indicators ◽

Genome Wide ◽

Alternative Approach ◽

Polygenic Scores

Identifying causal effects in non-experimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables (i.e. Mendelian Randomization - MR). However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in non-experimental data that would also undermine the ability of MR to correct for endogeneity bias from non-genetic sources. Here, we propose an alternative approach, GIV regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGS) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into non-overlapping subsamples, we obtain multiple indicators of the outcome PGS that can be used as instruments for each other, and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.

Download Full-text

Genetic Nature or Genetic Nurture? Quantifying Bias in Analyses Using Polygenic Scores

10.1101/524850 ◽

2019 ◽

Cited By ~ 9

Author(s):

Sam Trejo ◽

Benjamin W. Domingue

Keyword(s):

Fixed Effects ◽

Regression Models ◽

Genome Wide Association Study ◽

Causal Effect ◽

Genetic Effects ◽

Summary Statistics ◽

Behavioral Traits ◽

Genome Wide ◽

A Genome ◽

Polygenic Scores

AbstractSummary statistics from a genome-wide association study (GWAS) can be used to generate a polygenic score (PGS). For complex, behavioral traits, the correlation between an individual’s PGS and their phenotype may contain bias alongside the causal effect of the individual’s genes (due to geographic, ancestral, and/or socioeconomic confounding). We formalize the recent introduction of a different source of bias in regression models using PGSs: the effects of parental genes on offspring outcomes, also known as genetic nurture. GWAS do not discriminate between the various pathways through which genes influence outcomes, meaning existing PGSs capture both direct genetic effects and genetic nurture effects. We construct a theoretical model for genetic effects and show that, unlike other sources of bias in PGSs, the presence of genetic nurture biases PGS coefficients from both naïve OLS (between-family) and family fixed effects (within-family) regressions. This bias is in opposite directions; while naïve OLS estimates are biased upwards, family fixed effects estimates are biased downwards. We quantify this bias for a given trait using two novel parameters that we identify and discuss: (1) the genetic correlation between the direct and nurture effects and (2) the ratio of the SNP heritabilities for the direct and nurture effects.

Download Full-text

Genome-wide association study of body height in African Americans: the Women's Health Initiative SNP Health Association Resource (SHARe)

Human Molecular Genetics ◽

10.1093/hmg/ddr489 ◽

2011 ◽

Vol 21 (3) ◽

pp. 711-720 ◽

Cited By ~ 53

Author(s):

Cara L. Carty ◽

Nicholas A. Johnson ◽

Carolyn M. Hutter ◽

Alexander P. Reiner ◽

Ulrike Peters ◽

...

Keyword(s):

African Americans ◽

Association Study ◽

Women's Health ◽

Genome Wide Association Study ◽

Body Height ◽

Genome Wide Association ◽

Health Initiative ◽

Genome Wide ◽

Health Association ◽

The Women’S Health Initiative

Download Full-text

Assessing causality in associations between cannabis use and schizophrenia risk: a two-sample Mendelian randomization study

Psychological Medicine ◽

10.1017/s0033291716003172 ◽

2016 ◽

Vol 47 (5) ◽

pp. 971-980 ◽

Cited By ~ 82

Author(s):

S. H. Gage ◽

H. J. Jones ◽

S. Burgess ◽

J. Bowden ◽

G. Davey Smith ◽

...

Keyword(s):

Fixed Effects ◽

Genome Wide Association Study ◽

Mendelian Randomization ◽

Causal Effect ◽

Positive Control ◽

Negative Control ◽

Study Data ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Genome Wide Data

BackgroundObservational associations between cannabis and schizophrenia are well documented, but ascertaining causation is more challenging. We used Mendelian randomization (MR), utilizing publicly available data as a method for ascertaining causation from observational data.MethodWe performed bi-directional two-sample MR using summary-level genome-wide data from the International Cannabis Consortium (ICC) and the Psychiatric Genomics Consortium (PGC2). Single nucleotide polymorphisms (SNPs) associated with cannabis initiation (p < 10−5) and schizophrenia (p < 5 × 10−8) were combined using an inverse-variance-weighted fixed-effects approach. We also used height and education genome-wide association study data, representing negative and positive control analyses.ResultsThere was some evidence consistent with a causal effect of cannabis initiation on risk of schizophrenia [odds ratio (OR) 1.04 per doubling odds of cannabis initiation, 95% confidence interval (CI) 1.01–1.07, p = 0.019]. There was strong evidence consistent with a causal effect of schizophrenia risk on likelihood of cannabis initiation (OR 1.10 per doubling of the odds of schizophrenia, 95% CI 1.05–1.14, p = 2.64 × 10−5). Findings were as predicted for the negative control (height: OR 1.00, 95% CI 0.99–1.01, p = 0.90) but weaker than predicted for the positive control (years in education: OR 0.99, 95% CI 0.97–1.00, p = 0.066) analyses.ConclusionsOur results provide some that cannabis initiation increases the risk of schizophrenia, although the size of the causal estimate is small. We find stronger evidence that schizophrenia risk predicts cannabis initiation, possibly as genetic instruments for schizophrenia are stronger than for cannabis initiation.

Download Full-text

Genetic Effects on Longitudinal Cognitive Decline During the Early Stages of Alzheimer's Disease

10.21203/rs.3.rs-149163/v1 ◽

2021 ◽

Author(s):

Atul Kumar ◽

Maryam Shoai ◽

Sebastian Palmqvist ◽

Erik Stomrud ◽

John Hardy ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cognitive Decline ◽

Genome Wide Association Study ◽

Early Stage ◽

Cognitive Change ◽

Preclinical Ad ◽

Genome Wide ◽

A Genome ◽

Polygenic Scores

Abstract Background Cognitive decline in early-stage Alzheimer’s disease (AD) may depend on genetic variability. Methods In the Swedish BioFINDER study, we used polygenic scores (PGS) (for AD, intelligence and educational attainment), and genetic variants (in a genome-wide association study [GWAS]) to predict longitudinal cognitive change (measured by MMSE) over a mean of 4.2 years. We included 555 β-amyloid (Aβ) negative cognitively unimpaired (CU) individuals, 206 Aβ-positive CU (preclinical AD), 110 Aβ-negative mild cognitive impairment (MCI) patients, and 146 Aβ-positive MCI patients (prodromal AD). Results Polygenic scores for AD (in Aβ-positive individuals) and intelligence (independent of Aβ-status) were associated with cognitive decline. Eight genes were associated with cognitive decline in GWAS (3 independent of Aβ-status). Conclusions AD risk genes may influence cognitive decline in early AD, while genes related to intelligence may modulate cognitive decline irrespective of disease. Therapies targeting the implicated biological pathways may modulate the clinical course of AD.

Download Full-text

66 A genome-wide association study for gestation length in swine

Journal of Animal Science ◽

10.1093/jas/skz122.071 ◽

2019 ◽

Vol 97 (Supplement_2) ◽

pp. 40-40

Author(s):

Garrett See ◽

Melanie Trenhaile-Gannemann ◽

Daniel Ciobanu ◽

Matthew L Spangler ◽

Benny Mote

Keyword(s):

Fixed Effects ◽

Genome Wide Association Study ◽

Group Development ◽

Genome Wide Association ◽

Gestation Length ◽

Genome Wide ◽

A Genome ◽

University Of Nebraska Lincoln ◽

Genomic Regions ◽

The University

Abstract The objective of the current study was to conduct a genome-wide association on gestation length (GL) in different parities in swine. Sows (n = 831) belonging to the University of Nebraska – Lincoln resource population (Landrace X Nebraska Index Line) were utilized. GL was defined as the number of days between the final insemination and farrowing. Four traits, GL at parity 1, 2, 3 and 4 (GL1, GL2, GL3 and GL4, respectively) were investigated. Animals which were induced 24h prior to a farrowing event were removed from the analysis. Sows were genotyped with the Illumina SNP60 BeadArray. A Bayes C model with π=0.995 was implemented with fixed effects of contemporary group, development pen, diet, linear and quadratic terms for age at puberty (GL1; P < 0.01), and linear and quadratic terms for farrowing age (GL2; P < 0.01). Results are posterior means of 55,000 samples.Single marker association analysis (SMA) was performed in R utilizing a linear model on SNP from 1-Mb windows (n = 10) which explained the largest proportion of genetic variation in GL1. Top 10 (0.5% of all windows) 1-Mb windows accounted for a limited proportion of genetic variance, 7.75, 4.66, 3.45 and 2.05% in GL1, GL2, GL3 and GL4, respectively. Posterior mean heritability estimates (posterior SD) for GL1, GL2, GL3 and GL4 were 0.33 (0.06), 0.34 (0.07), 0.32 (0.08) and 0.20 (0.08), respectively. The top SNP (ASGA0017859, SSC4, 7.8 Mb) located in one of the two top common genomic regions associated with GL1, GL2 and GL3 displayed a difference of 1.1d in GL1 between alternate homozygotes (P < 0.01). The top SNP from nine of the ten regions were significant (P < 0.05) in the SMA. Two of these regions were in common with GL2 and GL3 where SNP with potential functional effects were found in ZFAT, MAML2 and CCDC82. Results suggest GL is a largely polygenic trait.

Download Full-text

Genome-Wide Association Study of Circulating Interleukin 6 Levels Identifies Novel Loci

Human Molecular Genetics ◽

10.1093/hmg/ddab023 ◽

2021 ◽

Author(s):

Tarunveer S Ahluwalia ◽

Bram P Prins ◽

Mohammadreza Abdollahi ◽

Nicola J Armstrong ◽

Stella Aslibekyan ◽

...

Keyword(s):

Association Study ◽

Interleukin 6 ◽

Fixed Effects ◽

Complex Disease ◽

Genome Wide Association Study ◽

Heritability Estimate ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide ◽

Increased Risk

Abstract Interleukin-6 (IL-6) is a multifunctional cytokine with both pro- and anti-inflammatory properties with a heritability estimate of up to 61%. The circulating levels of IL-6 in blood have been associated with an increased risk of complex disease pathogenesis. We conducted a two-staged, discovery, and replication meta genome-wide association study (GWAS) of circulating serum IL-6 levels comprising up to 67 428 (ndiscovery = 52 654 and nreplication = 14 774) individuals of European ancestry. The inverse variance fixed-effects based discovery meta-analysis, followed by replication led to the identification of two independent loci, IL1F10/IL1RN rs6734238 on Chromosome (Chr) 2q14, (pcombined = 1.8 × 10−11), HLA-DRB1/DRB5 rs660895 on Chr6p21 (pcombined = 1.5 × 10−10) in the combined meta-analyses of all samples. We also replicated the IL6R rs4537545 locus on Chr1q21 (pcombined = 1.2 × 10−122). Our study identifies novel loci for circulating IL-6 levels uncovering new immunological and inflammatory pathways that may influence IL-6 pathobiology.

Download Full-text

Evaluation of Genomic Selection for Seven Economic Traits in Yellow Drum (Nibea albiflora)

Marine Biotechnology ◽

10.1007/s10126-019-09925-7 ◽

2019 ◽

Vol 21 (6) ◽

pp. 806-812 ◽

Cited By ~ 2

Author(s):

Guijia Liu ◽

Linsong Dong ◽

Linlin Gu ◽

Zhaofang Han ◽

Wenjing Zhang ◽

...

Keyword(s):

Genomic Selection ◽

Body Length ◽

Genome Wide Association Study ◽

Body Height ◽

Economic Traits ◽

Informative Snps ◽

Genome Wide ◽

Nibea Albiflora ◽

Selection For ◽

Swimming Bladder

AbstractYellow drum (Nibea albiflora) is an important maricultural fish in China, and genetic improvement is necessary for this species. This research evaluated the application of genomic selection methods to predict the genetic values of seven economic traits for yellow drum. Using genome-wide single-nucleotide polymorphisms (SNPs), we estimated the genetic parameters for seven traits, including body length (BL), swimming bladder index (SBI), swimming bladder weight (SBW), body thickness (BT), body height (BH), body length/body height ratio (LHR), and gonad weight index (GWI). The heritability estimates ranged from 0.309 to 0.843. We evaluated the prediction performance of various statistical methods, and no one method provided the highest predictive ability for all traits. We then evaluated and compared the use of genome-wide association study (GWAS)–informative SNPs and random SNPs for prediction and found that GWAS-informative SNPs obviously increased. It only needed 5 and 100 informative SNPs for LHR and BT to achieve almost the same predictive abilities as using genome-wide SNPs, and for BL, SBI, SBW, BH, and GWI, about 1000 to 3000 informative SNPs were needed to achieve whole-genome level predictive abilities. It can be concluded from the test results that breeders can use fewer SNPs to save the breeding costs of genomic selection for some traits.

Download Full-text

Genome-wide association study reveals genetic link between diarrhea-associated Entamoeba histolytica infection and inflammatory bowel disease

10.1101/137448 ◽

2017 ◽

Cited By ~ 2

Author(s):

Genevieve L Wojcik ◽

Chelsea Marie ◽

Mayuresh M Abhyankar ◽

Nobuya Yoshida ◽

Koji Watanabe ◽

...

Keyword(s):

Inflammatory Bowel Disease ◽

Association Study ◽

Entamoeba Histolytica ◽

Bowel Disease ◽

Fixed Effects ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Genome Wide ◽

A Genome ◽

Inflammatory Bowel

AbstractDiarrhea is the second leading cause of death for children globally, causing 760,000 deaths each year in children under the age of 5. Amoebic dysentery contributes significantly to this burden, especially in developing countries. We hypothesize that genetic variation contributes to susceptibility to diarrhea-associated Entamoeba histolytica infection in Bangladeshi infants; thus, we conducted a genome-wide association study (GWAS) in two independent birth cohorts of diarrhea-associated E. histolytica infection. Cases were defined as children with at least one diarrheal episode positive for E. histolytica through either PCR or ELISA within the first year of life. Controls were children without any episodes positive for E. histolytica in the same time frame. Meta-analyses under a fixed-effects inverse variance weighting model identified variants in two neighboring genes on chromosome 10: CUL2 (cullin 2) and CREM (cAMP responsive element modulator) associated with E. histolytica infection, with SNP rs58000832 achieving genome-wide significance (Pmeta=4.2x10−10). Each additional risk allele (an intergenic insertion between CREM and CCNY) of rs58000832 conferred 2.5 increased odds of a diarrhea-associated E. histolytica infection. The most associated SNP within a gene was in an intron of CREM (rs58468685, Pmeta=2.3x10−9), which with CUL2, has been implicated as a susceptibility locus for Inflammatory Bowel Disease (IBD) and Crohn’s Disease. Gene expression resources suggest these loci are related to the higher expression of CREM, but not CUL2. Increased CREM expression is also observed in early E. histolytica infection. Further, CREM-/- mice were more susceptible to E. histolytica amebic colitis. These genetic associations reinforce the pathological similarities observed in gut inflammation between E. histolytica infection and IBD.

Download Full-text

Discovery Of 42 Genome-Wide Significant Loci Associated With Dyslexia

10.1101/2021.08.20.21262334 ◽

2021 ◽

Author(s):

Catherine Doust ◽

Pierre Fontanillas ◽

Else Eising ◽

Scott D Gordon ◽

Zhengjun Wang ◽

...

Keyword(s):

Genome Wide Association Study ◽

Fold Increase ◽

European Ancestry ◽

Modern Life ◽

Genetic Covariance ◽

Hyperactivity Disorder ◽

Genome Wide ◽

A Genome ◽

Polygenic Scores ◽

Study Power

Reading and writing are crucial for many aspects of modern life but up to 1 in 10 children are affected by dyslexia, which can persist into adulthood. Family studies of dyslexia suggest heritability up to 70%, yet no convincing genetic markers have been found due to limited study power. Here, we present a genome-wide association study representing a 20-fold increase in sample size from prior work, with 51,800 adults self-reporting a dyslexia diagnosis and 1,087,070 controls. We identified 42 independent genome-wide significant loci: 17 are in genes linked to or pleiotropic with cognitive ability/educational attainment; 25 are novel and may be more specifically associated with dyslexia. Twenty-three loci (12 novel) were validated in independent cohorts of Chinese and European ancestry. We confirmed a similar genetic aetiology of dyslexia between sexes, and found genetic covariance with many traits, including ambidexterity, but not neuroanatomical measures of language-related circuitry. Causal analyses revealed a directional effect of dyslexia on attention deficit hyperactivity disorder and bidirectional effects on socio-educational traits but these relationships require further investigation. Dyslexia polygenic scores explained up to 6% of variance in reading traits in independent cohorts, and might in future enable earlier identification and remediation of dyslexia.

Download Full-text