scholarly journals A unifying framework for summary statistic imputation

2018 ◽  
Author(s):  
Yue Wu ◽  
Eleazar Eskin ◽  
Sriram Sankararaman

AbstractImputation has been widely utilized to aid and interpret the results of Genome-Wide Association Studies(GWAS). Imputation can increase the power to identify associations when the causal variant was not directly observed or typed in the GWAS. There are two broad classes of methods for imputation. The first class imputes the genotypes at the untyped variants given the genotypes at the typed variants and then performs a statistical test of association at the imputed variants. The second class of methods, summary statistic imputation, directly imputes the association statics at the untyped variants given the association statistics observed at the typed variants. This second class of methods is appealing as it tends to be computationally efficient while only requiring the summary statistics from a study while the former class requires access to individual-level data that can be difficult to obtain. The statistical properties of these two classes of imputation methods have not been fully understood. In this paper, we show that the two classes of imputation methods are equivalent, i.e., have identical asymptotic multivariate normal distributions with zero mean and minor variations in the covariance matrix, under some reasonable assumptions. Using this equivalence, we can understand the effect of imputation methods on power. We show that a commonly employed modification of summary statistic imputation that we term summary statistic imputation with variance re-weighting generally leads to a loss in power. On the other hand, our proposed method, summary statistic imputation without performing variance re-weighting, fully accounts for imputation uncertainty while achieving better power.

2018 ◽  
Vol 49 (13) ◽  
pp. 2197-2205 ◽  
Author(s):  
Hannah M. Sallis ◽  
George Davey Smith ◽  
Marcus R. Munafò

AbstractBackgroundDespite the well-documented association between smoking and personality traits such as neuroticism and extraversion, little is known about the potential causal nature of these findings. If it were possible to unpick the association between personality and smoking, it may be possible to develop tailored smoking interventions that could lead to both improved uptake and efficacy.MethodsRecent genome-wide association studies (GWAS) have identified variants robustly associated with both smoking phenotypes and personality traits. Here we use publicly available GWAS summary statistics in addition to individual-level data from UK Biobank to investigate the link between smoking and personality. We first estimate genetic overlap between traits using LD score regression and then use bidirectional Mendelian randomisation methods to unpick the nature of this relationship.ResultsWe found clear evidence of a modest genetic correlation between smoking behaviours and both neuroticism and extraversion. We found some evidence that personality traits are causally linked to certain smoking phenotypes: among current smokers each additional neuroticism risk allele was associated with smoking an additional 0.07 cigarettes per day (95% CI 0.02–0.12, p = 0.009), and each additional extraversion effect allele was associated with an elevated odds of smoking initiation (OR 1.015, 95% CI 1.01–1.02, p = 9.6 × 10−7).ConclusionWe found some evidence for specific causal pathways from personality to smoking phenotypes, and weaker evidence of an association from smoking initiation to personality. These findings could be used to inform future smoking interventions or to tailor existing schemes.


2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Gibran Hemani ◽  
Jie Zheng ◽  
Benjamin Elsworth ◽  
Kaitlin H Wade ◽  
Valeriia Haberland ◽  
...  

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.


2021 ◽  
pp. annrheumdis-2019-216794
Author(s):  
Akari Suzuki ◽  
Matteo Maurizio Guerrini ◽  
Kazuhiko Yamamoto

For more than a decade, genome-wide association studies have been applied to autoimmune diseases and have expanded our understanding on the pathogeneses. Genetic risk factors associated with diseases and traits are essentially causative. However, elucidation of the biological mechanism of disease from genetic factors is challenging. In fact, it is difficult to identify the causal variant among multiple variants located on the same haplotype or linkage disequilibrium block and thus the responsible biological genes remain elusive. Recently, multiple studies have revealed that the majority of risk variants locate in the non-coding region of the genome and they are the most likely to regulate gene expression such as quantitative trait loci. Enhancer, promoter and long non-coding RNA appear to be the main target mechanisms of the risk variants. In this review, we discuss functional genetics to challenge these puzzles.


2020 ◽  
Vol 117 (21) ◽  
pp. 11608-11613 ◽  
Author(s):  
Marcelo Blatt ◽  
Alexander Gusev ◽  
Yuriy Polyakov ◽  
Shafi Goldwasser

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.


2019 ◽  
Vol 28 (19) ◽  
pp. 3244-3254 ◽  
Author(s):  
Sarah Jinn ◽  
Cornelis Blauwendraat ◽  
Dawn Toolan ◽  
Cheryl A Gretzula ◽  
Robert E Drolet ◽  
...  

Abstract Multiple genome-wide association studies (GWAS) in Parkinson disease (PD) have identified a signal at chromosome 4p16.3; however, the causal variant has not been established for this locus. Deep investigation of the region resulted in one identified variant, the rs34311866 missense SNP (p.M393T) in TMEM175, which is 20 orders of magnitude more significant than any other SNP in the region. Because TMEM175 is a lysosomal gene that has been shown to influence α-synuclein phosphorylation and autophagy, the p.M393T variant is an attractive candidate, and we have examined its effect on TMEM175 protein and PD-related biology. After knocking down each of the genes located under the GWAS peak via multiple shRNAs, only TMEM175 was found to consistently influence accumulation of phosphorylated α-synuclein (p-α-syn). Examination of the p.M393T variant showed effects on TMEM175 function that were intermediate between the wild-type (WT) and knockout phenotypes, with reduced regulation of lysosomal pH in response to starvation and minor changes in clearance of autophagy substrates, reduced lysosomal localization, and increased accumulation of p-α-syn. Finally, overexpression of WT TMEM175 protein reduced p-α-syn, while overexpression of the p.M393T variant resulted in no change in α-synuclein phosphorylation. These results suggest that the main signal in the chromosome 4p16.3 PD risk locus is driven by the TMEM175 p.M393T variant. Modulation of TMEM175 may impact α-synuclein biology and therefore may be a rational therapeutic strategy for PD.


2019 ◽  
Author(s):  
Marios Arvanitis ◽  
Yanxiao Zhang ◽  
Wei Wang ◽  
Adam Auton ◽  
Ali Keramati ◽  
...  

AbstractHeart failure is a major medical and economic burden in the healthcare system affecting over 23 million people worldwide. Although recent pedigree studies estimate heart failure heritability around 26%, genome-wide association studies (GWAS) have had limited success in explaining disease pathogenesis. We conducted the largest meta-analysis of heart failure GWAS to-date and replicated our findings in a comparable sized cohort to identify one known and two novel variants associated with heart failure. Leveraging heart failure sub-phenotyping and fine-mapping, we reveal a putative causal variant found in a cardiac muscle specific regulatory region that binds to the ACTN2 cardiac sarcolemmal gene and affects left ventricular adverse remodeling and clinical heart failure in response to different initial cardiac muscle insults. Via genetic correlation, we show evidence of broadly shared heritability between heart failure and multiple musculoskeletal traits. Our findings extend our understanding of biological mechanisms underlying heart failure.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Min Zhang ◽  
Jing Chen ◽  
Zhiqun Yin ◽  
Lanbing Wang ◽  
Lihua Peng

AbstractObservational studies suggested a bidirectional correlation between depression and metabolic syndrome (MetS) and its components. However, the causal associations between them remained unclear. We aimed to investigate whether genetically predicted depression is related to the risk of MetS and its components, and vice versa. We performed a bidirectional two-sample Mendelian randomization (MR) study using summary-level data from the most comprehensive genome-wide association studies (GWAS) of depression (n = 2,113,907), MetS (n = 291,107), waist circumference (n = 462,166), hypertension (n = 463,010) fasting blood glucose (FBG, n = 281,416), triglycerides (n = 441,016), high-density lipoprotein cholesterol (HDL-C, n = 403,943). The random-effects inverse-variance weighted (IVW) method was applied as the primary method. The results identified that genetically predicted depression was significantly positive associated with risk of MetS (OR: 1.224, 95% CI: 1.091–1.374, p = 5.58 × 10−4), waist circumference (OR: 1.083, 95% CI: 1.027–1.143, p = 0.003), hypertension (OR: 1.028, 95% CI: 1.016–1.039, p = 1.34 × 10−6) and triglycerides (OR: 1.111, 95% CI: 1.060–1.163, p = 9.35 × 10−6) while negative associated with HDL-C (OR: 0.932, 95% CI: 0.885–0.981, p = 0.007) but not FBG (OR: 1.010, 95% CI: 0.986–1.034, p = 1.34). No causal relationships were identified for MetS and its components on depression risk. The present MR analysis strength the evidence that depression is a risk factor for MetS and its components (waist circumference, hypertension, FBG, triglycerides, and HDL-C). Early diagnosis and prevention of depression are crucial in the management of MetS and its components.


2017 ◽  
Author(s):  
Jose A. Lozano ◽  
Farhad Hormozdiari ◽  
Jong Wha (Joanne) Joo ◽  
Buhm Han ◽  
Eleazar Eskin

AbstractGenome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two cohorts is considered an associated variant. A challenge in the analysis of GWAS studies is the fact that human population history causes nearby genetic variants in the genome to be correlated with each other. In this review, we demonstrate how to utilize the multivariate normal (MVN) distribution to explicitly take into account the correlation between genetic variants in a comprehensive framework for analysis of GWAS. We show how the MVN framework can be applied to perform association testing, correct for multiple hypothesis testing, estimate statistical power, and perform fine mapping and imputation.


Sign in / Sign up

Export Citation Format

Share Document