pgainsim: an R-package to assess the mode of inheritance for quantitative trait loci in GWAS

Bioinformatics ◽

10.1093/bioinformatics/btab150 ◽

2021 ◽

Author(s):

Nora Scherer ◽

Peggy Sekula ◽

Peter Pfaffelhuber ◽

Pascal Schlosser

Keyword(s):

Quantitative Trait ◽

Multiple Testing ◽

Genetic Model ◽

Association Studies ◽

R Package ◽

Critical Values ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Trait Locus ◽

Mode Of Inheritance

Abstract Motivation When performing genome-wide association studies conventionally the additive genetic model is used to explore whether a single nucleotide polymorphism (SNP) is associated with a quantitative trait. But for variants, which do not follow an intermediate mode of inheritance (MOI), the recessive or the dominant genetic model can have more power to detect associations and furthermore the MOI is important for downstream analyses and clinical interpretation. When multiple MOIs are modelled the question arises, which describes the true underlying MOI best. Results We developed an R-package allowing for the first time to determine study specific critical values when one of the three models is more informative than the other ones for a quantitative trait locus. The package allows for user-friendly simulations to determine these critical values with predefined minor allele frequencies and study sizes. For application scenarios with extensive multiple testing we integrated an interpolation functionality to determine critical values already based on a moderate number of random draws. Availability and implementation The R-package pgainsim is freely available for download on Github at https://github.com/genepi-freiburg/pgainsim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Bayesian Framework for Multiple Trait Colocalization from Summary Association Statistics

10.1101/155481 ◽

2017 ◽

Cited By ~ 9

Author(s):

Claudia Giambartolomei ◽

Jimmy Zhenli Liu ◽

Wen Zhang ◽

Mads Hauberg ◽

Huwenbo Shi ◽

...

Keyword(s):

Disease Risk ◽

Association Studies ◽

R Package ◽

Supplementary Information ◽

External Information ◽

Genome Wide Association Studies ◽

Human Brain Tissue ◽

Multiple Trait ◽

Trait Locus ◽

Regulatory Effects

AbstractMotivationMost genetic variants implicated in complex diseases by genome-wide association studies (GWAS) are non-coding, making it challenging to understand the causative genes involved in disease. Integrating external information such as quantitative trait locus (QTL) mapping of molecular traits (e.g., expression, methylation) is a powerful approach to identify the subset of GWAS signals explained by regulatory effects. In particular, expression QTLs (eQTLs) help pinpoint the responsible gene among the GWAS regions that harbor many genes, while methylation QTLs (mQTLs) help identify the epigenetic mechanisms that impact gene expression which in turn affect disease risk. In this work we propose multiple-trait-coloc (moloc), a Bayesian statistical framework that integrates GWAS summary data with multiple molecular QTL data to identify regulatory effects at GWAS risk loci.ResultsWe applied moloc to schizophrenia (SCZ) and eQTL/mQTL data derived from human brain tissue and identified 52 candidate genes that influence SCZ through methylation. Our method can be applied to any GWAS and relevant functional data to help prioritize disease associated genes.Availabilitymoloc is available for download as an R package (https://github.com/clagiamba/moloc). We also developed a web site to visualize the biological findings (icahn.mssm.edu/moloc). The browser allows searches by gene, methylation probe, and scenario of [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Combinatorial and statistical prediction of gene expression from haplotype sequence

Bioinformatics ◽

10.1093/bioinformatics/btaa318 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i194-i202

Author(s):

Berk A Alpay ◽

Pinar Demetci ◽

Sorin Istrail ◽

Derek Aguiar

Keyword(s):

Gene Expression ◽

Multiple Testing ◽

Association Studies ◽

Classification Problem ◽

Statistical Prediction ◽

Model Complexity ◽

Supplementary Information ◽

Prediction Methods ◽

Genome Wide Association Studies ◽

Regulatory Effects

Abstract Motivation Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. Results In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. Availability and implementation Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Gene-based association tests using GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btz172 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3701-3708 ◽

Cited By ~ 6

Author(s):

Gulnara R Svishcheva ◽

Nadezhda M Belonogova ◽

Irina V Zorkoltseva ◽

Anatoly V Kirichenko ◽

Tatiana I Axenovich

Keyword(s):

Association Analysis ◽

Association Studies ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

New Material ◽

Artery Disease ◽

The Many ◽

Functional Linear Regression

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ADDO: a comprehensive toolkit to detect, classify and visualize additive and non-additive quantitative trait loci

Bioinformatics ◽

10.1093/bioinformatics/btz786 ◽

2019 ◽

Vol 36 (5) ◽

pp. 1517-1521

Author(s):

Leilei Cui ◽

Bin Yang ◽

Nikolas Pontikos ◽

Richard Mott ◽

Lusheng Huang

Keyword(s):

Quantitative Trait Loci ◽

Quantitative Trait ◽

Association Studies ◽

Genetic Effects ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Additive Effects ◽

A Genome ◽

Trait Loci ◽

Additive Genetic Effects

Abstract Motivation During the past decade, genome-wide association studies (GWAS) have been used to map quantitative trait loci (QTLs) underlying complex traits. However, most GWAS focus on additive genetic effects while ignoring non-additive effects, on the assumption that most QTL act additively. Consequently, QTLs driven by dominance and other non-additive effects could be overlooked. Results We developed ADDO, a highly efficient tool to detect, classify and visualize QTLs with additive and non-additive effects. ADDO implements a mixed-model transformation to control for population structure and unequal relatedness that accounts for both additive and dominant genetic covariance among individuals, and decomposes single-nucleotide polymorphism effects as either additive, partial dominant, dominant or over-dominant. A matrix multiplication approach is used to accelerate the computation: a genome scan on 13 million markers from 900 individuals takes about 5 h with 10 CPUs. Analysis of simulated data confirms ADDO’s performance on traits with different additive and dominance genetic variance components. We showed two real examples in outbred rat where ADDO identified significant dominant QTL that were not detectable by an additive model. ADDO provides a systematic pipeline to characterize additive and non-additive QTL in whole genome sequence data, which complements current mainstream GWAS software for additive genetic effects. Availability and implementation ADDO is customizable and convenient to install and provides extensive analytics and visualizations. The package is freely available online at https://github.com/LeileiCui/ADDO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Expression Quantitative Trait Locus (eQTL) analysis in human heart for elucidation of causal genes at loci identified in genome-wide association studies

European Heart Journal ◽

10.1093/eurheartj/eht309.2601 ◽

2013 ◽

Vol 34 (suppl 1) ◽

pp. 2601-2601

Author(s):

M. E. Adriaens ◽

T. T. Koopmann ◽

P. D. Moerland ◽

M. L. Westerveld ◽

R. F. Marsman ◽

...

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait ◽

Human Heart ◽

Association Studies ◽

Genome Wide Association ◽

Eqtl Analysis ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Causal Genes ◽

Trait Locus

Download Full-text

Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast

10.1101/2021.09.08.459513 ◽

2021 ◽

Author(s):

Alex N. Nguyen Ba ◽

Katherine R. Lawrence ◽

Artur Rego-Costa ◽

Shreyas Gopalakrishnan ◽

Daniel Temko ◽

...

Keyword(s):

Quantitative Trait Locus ◽

Qtl Mapping ◽

Quantitative Trait ◽

Complex Traits ◽

Large Scale ◽

Genetic Basis ◽

Association Studies ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Trait Locus

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.

Download Full-text

MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btab665 ◽

2021 ◽

Author(s):

Alan E Murphy ◽

Brian M Schilder ◽

Nathan G Skene

Keyword(s):

Quality Control ◽

Association Studies ◽

Meta Analysis ◽

Genetic Research ◽

Secondary Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Summary Statistic

Abstract Motivation Genome-wide association studies (GWAS) summary statistics have popularised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. Results To address this issue, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file, VCF or R native data object. Availability MungeSumstats is available on Bioconductor (v 3.13) and can also be found on Github at: https://neurogenomics.github.io/MungeSumstats Supplementary information The analysis deriving the most common summary statistic formats is available at: https://al-murphy.github.io/SumstatFormats

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text