scholarly journals RAISS: robust and accurate imputation from summary statistics

2019 ◽  
Vol 35 (22) ◽  
pp. 4837-4839 ◽  
Author(s):  
Hanna Julienne ◽  
Huwenbo Shi ◽  
Bogdan Pasaniuc ◽  
Hugues Aschard

Abstract Motivation Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. Results We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. Availability and implementation The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Hanna Julienne ◽  
Huwenbo Shi ◽  
Bogdan Pasaniuc ◽  
Hugues Aschard

AbstractMotivationMulti-trait analyses using public summary statistics from genome-wide association studies (GWAS) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. While methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses.ResultsWe fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for small size-effect variants on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel.AvailabilityThe python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/[email protected]


Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


2018 ◽  
Author(s):  
Corbin Quick ◽  
Christian Fuchsberger ◽  
Daniel Taliun ◽  
Gonçalo Abecasis ◽  
Michael Boehnke ◽  
...  

AbstractSummaryEstimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies (GWAS). Large genetic data sets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD orders of magnitude faster than existing tools.Availability and ImplementationemeraLD is implemented in C++, and is open source under GPLv3. Source code, documentation, an R interface, and utilities for analysis of summary statistics are freely available at http://github.com/statgen/[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


Author(s):  
Xiaofeng Zhu ◽  
Xiaoyin Li ◽  
Rong Xu ◽  
Tao Wang

Abstract Motivation The overall association evidence of a genetic variant with multiple traits can be evaluated by cross-phenotype association analysis using summary statistics from genome-wide association studies. Further dissecting the association pathways from a variant to multiple traits is important to understand the biological causal relationships among complex traits. Results Here, we introduce a flexible and computationally efficient Iterative Mendelian Randomization and Pleiotropy (IMRP) approach to simultaneously search for horizontal pleiotropic variants and estimate causal effect. Extensive simulations and real data applications suggest that IMRP has similar or better performance than existing Mendelian Randomization methods for both causal effect estimation and pleiotropic variant detection. The developed pleiotropy test is further extended to detect colocalization for multiple variants at a locus. IMRP will greatly facilitate our understanding of causal relationships underlying complex traits, in particular, when a large number of genetic instrumental variables are used for evaluating multiple traits. Availability and implementation The software IMRP is available at https://github.com/XiaofengZhuCase/IMRP. The simulation codes can be downloaded at http://hal.case.edu/∼xxz10/zhu-web/ under the link: MR Simulations software. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Jiangming Sun ◽  
Yunpeng Wang

ABSTRACTSummaryPost-GWAS studies using the results from large consortium meta-analysis often need to correctly take care of the overlapping sample issue. The gold standard approach for resolving this issue is to reperform the GWAS or meta-analysis excluding the overlapped participants. However, such approach is time-consuming and, sometimes, restricted by the available data. deMeta provides a user friendly and computationally efficient command-line implementation for removing the effect of a contributing sub-study to a consortium from the meta-analysis results. Only the summary statistics of the meta-analysis the sub-study to be removed are required. In addition, deMeta can generate contrasting Manhattan and quantile-quantile plots for users to visualize the impact of the sub-study on the meta-analysis results.Availability and ImplementationThe python source code, examples and documentations of deMeta are publicly available at https://github.com/Computational-NeuroGenetics/[email protected] (J. Sun); [email protected] (Y. Wang)Supplementary informationNone.


2020 ◽  
Vol 36 (9) ◽  
pp. 2763-2769
Author(s):  
Jie-Huei Wang ◽  
Yi-Hau Chen

Abstract Motivation In gene expression and genome-wide association studies, the identification of interaction effects is an important and challenging issue owing to its ultrahigh-dimensional nature. In particular, contaminated data and right-censored survival outcome make the associated feature screening even challenging. Results In this article, we propose an inverse probability-of-censoring weighted Kendall’s tau statistic to measure association of a survival trait with biomarkers, as well as a Kendall’s partial correlation statistic to measure the relationship of a survival trait with an interaction variable conditional on the main effects. The Kendall’s partial correlation is then used to conduct interaction screening. Simulation studies under various scenarios are performed to compare the performance of our proposal with some commonly available methods. In the real data application, we utilize our proposed method to identify epistasis associated with the clinical survival outcomes of non-small-cell lung cancer, diffuse large B-cell lymphoma and lung adenocarcinoma patients. Both simulation and real data studies demonstrate that our method performs well and outperforms existing methods in identifying main and interaction biomarkers. Availability and implementation R-package ‘IPCWK’ is available to implement this method, together with a reference manual describing how to perform the ‘IPCWK’ package. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Jack W. O’Sullivan ◽  
John P . A. Ioannidis

Abstract With the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) associated with various phenotypes has accelerated. An open question is whether genome-wide significant SNPs identified in earlier genome-wide association studies (GWAS) are replicated in later GWAS conducted in biobanks. To address this, we examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, “replication” GWAS done in UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0%; although lower for binary than quantitative phenotypes (58.1% versus 94.8% respectively). There was a 18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may reflect lack of power rather than genuine false-positives, these results provide insights about which discovered associations are likely to be replicated across subsequent GWAS.


Sign in / Sign up

Export Citation Format

Share Document