gwasrapidd: an R package to query, download and wrangle GWAS Catalog data

AbstractMotivationThe NHGRI Catalog of Published Genome-Wide Association Studies (GWAS) Catalog has collected, curated, and made available data from over 3 900 studies. The recently developed GWAS Catalog REST API is the only method allowing programmatic access to this resource.ResultsHere, we describe gwasrapidd, an R package that provides a client interface to the GWAS Catalog REST API, representing an important software counterpart to the server-side component. gwasrapidd enables users to quickly retrieve, filter and integrate data with comprehensive bioinformatics analysis tools, which is particularly critical for those looking into functional characterisation of risk loci.Availabilitygwasrapidd is freely available under an MIT License, and can be accessed from https://github.com/ramiromagno/gwasrapidd.

Download Full-text

gwasrapidd: an R package to query, download and wrangle GWAS catalog data

Bioinformatics ◽

10.1093/bioinformatics/btz605 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ramiro Magno ◽

Ana-Teresa Maia

Keyword(s):

Association Studies ◽

Functional Characterization ◽

Application Programming Interface ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Server Side ◽

National Human ◽

Rest Api ◽

Gwas Catalog

Abstract Motivation The National Human Genome Research Institute Catalog of Published Genome-Wide Association Studies (GWAS) Catalog has collected, curated and made available data from over 7100 studies. The recently developed GWAS Catalog representational state transfer (REST) application programming interface (API) is the only method allowing programmatic access to this resource. Results Here, we describe gwasrapidd, an R package that provides the first client interface to the GWAS Catalog REST API, representing an important software counterpart to the server-side component. gwasrapidd enables users to quickly retrieve, filter and integrate data with comprehensive bioinformatics analysis tools, which is particularly critical for those looking into functional characterization of risk loci. Availability and implementation gwasrapidd is freely available under an MIT License, and can be accessed from https://github.com/ramiromagno/gwasrapidd. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)

Nucleic Acids Research ◽

10.1093/nar/gkw1133 ◽

2016 ◽

Vol 45 (D1) ◽

pp. D896-D901 ◽

Cited By ~ 1132

Author(s):

Jacqueline MacArthur ◽

Emily Bowler ◽

Maria Cerezo ◽

Laurent Gil ◽

Peggy Hall ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Gwas Catalog

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Qtlizer: comprehensive QTL annotation of GWAS results

Scientific Reports ◽

10.1038/s41598-020-75770-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Matthias Munz ◽

Inken Wohlers ◽

Eric Simon ◽

Tobias Reinberger ◽

Hauke Busch ◽

...

Keyword(s):

Association Studies ◽

Housekeeping Genes ◽

R Package ◽

Genome Wide Association Studies ◽

Protein Abundance ◽

Base Pairs ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Distance Limit

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Controlling the Rate of GWAS False Discoveries

10.1101/058230 ◽

2016 ◽

Author(s):

Damian Brzyski ◽

Christine B. Peterson ◽

Piotr Sobczyk ◽

Emmanuel J. Candés ◽

Malgorzata Bogdan ◽

...

Keyword(s):

Association Studies ◽

Genetic Association Studies ◽

R Package ◽

Dependence Structure ◽

Genome Wide Association Studies ◽

Genomic Locus ◽

Novel Approach ◽

Genome Wide ◽

Single Marker ◽

False Discoveries

AbstractWith the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on pre-screening to identify the level of resolution of distinct hypotheses. We show how FDR controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single marker and multivariate regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the NFBC66 cohort study.

Download Full-text

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

Download Full-text

CVRMS: Cross-validated Rank-based Marker Selection for Genome-wide Prediction of Low Heritability

10.1101/756130 ◽

2019 ◽

Author(s):

Seongmun Jeong ◽

Jae-Yoon Kim ◽

Namshin Kim

Keyword(s):

Genetic Information ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Marker Selection ◽

Genome Wide ◽

Precise Prediction ◽

Human Animal ◽

Selection For

AbstractCVRMS is an R package designed to extract marker subsets from repeated rank-based marker datasets generated from genome-wide association studies or marker effects for genome-wide prediction (https://github.com/lovemun/CVRMS). CVRMS provides an optimized genome-wide biomarker set with the best predictability of phenotype by implemented ridge regression using genetic information. Applying our method to human, animal, and plant datasets with wide heritability (zero to one), we selected hundreds to thousands of biomarkers for precise prediction.

Download Full-text

Identification of putative effector genes across the GWAS Catalog using molecular quantitative trait loci from 68 tissues and cell types

10.1101/808444 ◽

2019 ◽

Cited By ~ 2

Author(s):

Cong Guo ◽

Karsten B. Sieber ◽

Jorge Esparza-Gordillo ◽

Mark R. Hurle ◽

Kijoung Song ◽

...

Keyword(s):

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Effector Genes ◽

Genome Wide ◽

Trait Loci ◽

Gwas Catalog

AbstractIdentifying the effector genes from genome-wide association studies (GWAS) is a crucial step towards understanding the biological mechanisms underlying complex traits and diseases. Colocalization of expression and protein quantitative trait loci (eQTL and pQTL, hereafter collectively called “xQTL”) can be effective for mapping associations to genes in many loci. However, existing colocalization methods require full single-variant summary statistics which are often not readily available for many published GWAS or xQTL studies. Here, we present PICCOLO, a method that uses minimum SNP p-values within a locus to determine if pairs of genetic associations are colocalized. This method greatly expands the number of GWAS and xQTL datasets that can be tested for colocalization. We applied PICCOLO to 10,759 genome-wide significant associations across the NHGRI-EBI GWAS Catalog with xQTLs from 28 studies. We identified at least one colocalized gene-xQTL in at least one tissue for 30% of associations, and we pursued multiple lines of evidence to demonstrate that these mappings are biologically meaningful. PICCOLO genes are significantly enriched for biologically relevant tissues, and 4.3-fold enriched for targets of approved drugs.

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text