scholarly journals pgainsim: an R-package to assess the mode of inheritance for quantitative trait loci in GWAS

Author(s):  
Nora Scherer ◽  
Peggy Sekula ◽  
Peter Pfaffelhuber ◽  
Pascal Schlosser

Abstract Motivation When performing genome-wide association studies conventionally the additive genetic model is used to explore whether a single nucleotide polymorphism (SNP) is associated with a quantitative trait. But for variants, which do not follow an intermediate mode of inheritance (MOI), the recessive or the dominant genetic model can have more power to detect associations and furthermore the MOI is important for downstream analyses and clinical interpretation. When multiple MOIs are modelled the question arises, which describes the true underlying MOI best. Results We developed an R-package allowing for the first time to determine study specific critical values when one of the three models is more informative than the other ones for a quantitative trait locus. The package allows for user-friendly simulations to determine these critical values with predefined minor allele frequencies and study sizes. For application scenarios with extensive multiple testing we integrated an interpolation functionality to determine critical values already based on a moderate number of random draws. Availability and implementation The R-package pgainsim is freely available for download on Github at https://github.com/genepi-freiburg/pgainsim. Supplementary information Supplementary data are available at Bioinformatics online.

2017 ◽  
Author(s):  
Claudia Giambartolomei ◽  
Jimmy Zhenli Liu ◽  
Wen Zhang ◽  
Mads Hauberg ◽  
Huwenbo Shi ◽  
...  

AbstractMotivationMost genetic variants implicated in complex diseases by genome-wide association studies (GWAS) are non-coding, making it challenging to understand the causative genes involved in disease. Integrating external information such as quantitative trait locus (QTL) mapping of molecular traits (e.g., expression, methylation) is a powerful approach to identify the subset of GWAS signals explained by regulatory effects. In particular, expression QTLs (eQTLs) help pinpoint the responsible gene among the GWAS regions that harbor many genes, while methylation QTLs (mQTLs) help identify the epigenetic mechanisms that impact gene expression which in turn affect disease risk. In this work we propose multiple-trait-coloc (moloc), a Bayesian statistical framework that integrates GWAS summary data with multiple molecular QTL data to identify regulatory effects at GWAS risk loci.ResultsWe applied moloc to schizophrenia (SCZ) and eQTL/mQTL data derived from human brain tissue and identified 52 candidate genes that influence SCZ through methylation. Our method can be applied to any GWAS and relevant functional data to help prioritize disease associated genes.Availabilitymoloc is available for download as an R package (https://github.com/clagiamba/moloc). We also developed a web site to visualize the biological findings (icahn.mssm.edu/moloc). The browser allows searches by gene, methylation probe, and scenario of [email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i194-i202
Author(s):  
Berk A Alpay ◽  
Pinar Demetci ◽  
Sorin Istrail ◽  
Derek Aguiar

Abstract Motivation Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. Results In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. Availability and implementation Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3701-3708 ◽  
Author(s):  
Gulnara R Svishcheva ◽  
Nadezhda M Belonogova ◽  
Irina V Zorkoltseva ◽  
Anatoly V Kirichenko ◽  
Tatiana I Axenovich

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (5) ◽  
pp. 1517-1521
Author(s):  
Leilei Cui ◽  
Bin Yang ◽  
Nikolas Pontikos ◽  
Richard Mott ◽  
Lusheng Huang

Abstract Motivation During the past decade, genome-wide association studies (GWAS) have been used to map quantitative trait loci (QTLs) underlying complex traits. However, most GWAS focus on additive genetic effects while ignoring non-additive effects, on the assumption that most QTL act additively. Consequently, QTLs driven by dominance and other non-additive effects could be overlooked. Results We developed ADDO, a highly efficient tool to detect, classify and visualize QTLs with additive and non-additive effects. ADDO implements a mixed-model transformation to control for population structure and unequal relatedness that accounts for both additive and dominant genetic covariance among individuals, and decomposes single-nucleotide polymorphism effects as either additive, partial dominant, dominant or over-dominant. A matrix multiplication approach is used to accelerate the computation: a genome scan on 13 million markers from 900 individuals takes about 5 h with 10 CPUs. Analysis of simulated data confirms ADDO’s performance on traits with different additive and dominance genetic variance components. We showed two real examples in outbred rat where ADDO identified significant dominant QTL that were not detectable by an additive model. ADDO provides a systematic pipeline to characterize additive and non-additive QTL in whole genome sequence data, which complements current mainstream GWAS software for additive genetic effects. Availability and implementation ADDO is customizable and convenient to install and provides extensive analytics and visualizations. The package is freely available online at https://github.com/LeileiCui/ADDO. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Alex N. Nguyen Ba ◽  
Katherine R. Lawrence ◽  
Artur Rego-Costa ◽  
Shreyas Gopalakrishnan ◽  
Daniel Temko ◽  
...  

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.


Author(s):  
Alan E Murphy ◽  
Brian M Schilder ◽  
Nathan G Skene

Abstract Motivation Genome-wide association studies (GWAS) summary statistics have popularised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. Results To address this issue, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file, VCF or R native data object. Availability MungeSumstats is available on Bioconductor (v 3.13) and can also be found on Github at: https://neurogenomics.github.io/MungeSumstats Supplementary information The analysis deriving the most common summary statistic formats is available at: https://al-murphy.github.io/SumstatFormats


2019 ◽  
Vol 36 (8) ◽  
pp. 2626-2627
Author(s):  
Corentin Molitor ◽  
Matt Brember ◽  
Fady Mohareb

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document