scholarly journals MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics

Author(s):  
Alan E Murphy ◽  
Brian M Schilder ◽  
Nathan G Skene

Abstract Motivation Genome-wide association studies (GWAS) summary statistics have popularised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. Results To address this issue, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file, VCF or R native data object. Availability MungeSumstats is available on Bioconductor (v 3.13) and can also be found on Github at: https://neurogenomics.github.io/MungeSumstats Supplementary information The analysis deriving the most common summary statistic formats is available at: https://al-murphy.github.io/SumstatFormats

2021 ◽  
Author(s):  
Alan E Murphy ◽  
Nathan G Skene

Genome-wide association studies (GWAS) summary statistics have democratised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. To address these issues, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file.


2020 ◽  
Vol 36 (16) ◽  
pp. 4521-4522
Author(s):  
Ilja M Nolte

Abstract Summary Summary statistics from a meta-analysis of genome-wide association studies (meta-GWAS) can be used for many follow-up analyses. One valuable application is the creation of polygenic scores. However, if polygenic scores are calculated in a validation cohort that was part of the meta-GWAS consortium, this cohort is not independent and analyses will therefore yield inflated results. The R package ‘MetaSubtract’ was developed to subtract the results of the validation cohort from meta-GWAS summary statistics analytically. The statistical formulas for a meta-analysis were inverted to compute corrected summary statistics of a meta-GWAS leaving one (or more) cohort(s) out. These formulas have been implemented in MetaSubtract for different meta-analyses methods (fixed effects inverse variance or square root sample size weighted z-score) accounting for no, single or double genomic control correction. Results obtained by MetaSubtract correlate very well to those calculated using the traditional way, i.e. by performing a meta-analysis leaving out the validation cohort. In conclusion, MetaSubtract allows researchers to compute meta-GWAS summary statistics that are independent of the GWAS results of the validation cohort without requiring access to the cohort level GWAS results of the corresponding meta-GWAS consortium. Availability and implementation https://cran.r-project.org/web/packages/MetaSubtract. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3701-3708 ◽  
Author(s):  
Gulnara R Svishcheva ◽  
Nadezhda M Belonogova ◽  
Irina V Zorkoltseva ◽  
Anatoly V Kirichenko ◽  
Tatiana I Axenovich

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Jiangming Sun ◽  
Yunpeng Wang

ABSTRACTSummaryPost-GWAS studies using the results from large consortium meta-analysis often need to correctly take care of the overlapping sample issue. The gold standard approach for resolving this issue is to reperform the GWAS or meta-analysis excluding the overlapped participants. However, such approach is time-consuming and, sometimes, restricted by the available data. deMeta provides a user friendly and computationally efficient command-line implementation for removing the effect of a contributing sub-study to a consortium from the meta-analysis results. Only the summary statistics of the meta-analysis the sub-study to be removed are required. In addition, deMeta can generate contrasting Manhattan and quantile-quantile plots for users to visualize the impact of the sub-study on the meta-analysis results.Availability and ImplementationThe python source code, examples and documentations of deMeta are publicly available at https://github.com/Computational-NeuroGenetics/[email protected] (J. Sun); [email protected] (Y. Wang)Supplementary informationNone.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Aitana Alonso-Gonzalez ◽  
Manuel Calaza ◽  
Cristina Rodriguez-Fontenla ◽  
Angel Carracedo

Abstract Background Attention-Deficit Hyperactivity Disorder (ADHD) is a complex neurodevelopmental disorder (NDD) which may significantly impact on the affected individual’s life. ADHD is acknowledged to have a high heritability component (70–80%). Recently, a meta-analysis of GWAS (Genome Wide Association Studies) has demonstrated the association of several independent loci. Our main aim here, is to apply PASCAL (pathway scoring algorithm), a new gene-based analysis (GBA) method, to the summary statistics obtained in this meta-analysis. PASCAL will take into account the linkage disequilibrium (LD) across genomic regions in a different way than the most commonly employed GBA methods (MAGMA or VEGAS (Versatile Gene-based Association Study)). In addition to PASCAL analysis a gene network and an enrichment analysis for KEGG and GO terms were carried out. Moreover, GENE2FUNC tool was employed to create gene expression heatmaps and to carry out a (DEG) (Differentially Expressed Gene) analysis using GTEX v7 and BrainSpan data. Results PASCAL results have revealed the association of new loci with ADHD and it has also highlighted other genes previously reported by MAGMA analysis. PASCAL was able to discover new associations at a gene level for ADHD: FEZF1 (p-value: 2.2 × 10− 7) and FEZF1-AS1 (p-value: 4.58 × 10− 7). In addition, PASCAL has been able to highlight association of other genes that share the same LD block with some previously reported ADHD susceptibility genes. Gene network analysis has revealed several interactors with the associated ADHD genes and different GO and KEGG terms have been associated. In addition, GENE2FUNC has demonstrated the existence of several up and down regulated expression clusters when the associated genes and their interactors were considered. Conclusions PASCAL has been revealed as an efficient tool to extract additional information from previous GWAS using their summary statistics. This study has identified novel ADHD associated genes that were not previously reported when other GBA methods were employed. Moreover, a biological insight into the biological function of the ADHD associated genes across brain regions and neurodevelopmental stages is provided.


2015 ◽  
Author(s):  
Hon-Cheong SO ◽  
Pak C. SHAM

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg


Sign in / Sign up

Export Citation Format

Share Document