MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics

Mapping Intimacies ◽

10.1101/2021.06.21.449239 ◽

2021 ◽

Author(s):

Alan E Murphy ◽

Nathan G Skene

Keyword(s):

Quality Control ◽

Association Studies ◽

Meta Analysis ◽

Genetic Research ◽

Secondary Analysis ◽

R Package ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Summary Statistic ◽

File Formats

Genome-wide association studies (GWAS) summary statistics have democratised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. To address these issues, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file.

Download Full-text

MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btab665 ◽

2021 ◽

Author(s):

Alan E Murphy ◽

Brian M Schilder ◽

Nathan G Skene

Keyword(s):

Quality Control ◽

Association Studies ◽

Meta Analysis ◽

Genetic Research ◽

Secondary Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Summary Statistic

Abstract Motivation Genome-wide association studies (GWAS) summary statistics have popularised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. Results To address this issue, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file, VCF or R native data object. Availability MungeSumstats is available on Bioconductor (v 3.13) and can also be found on Github at: https://neurogenomics.github.io/MungeSumstats Supplementary information The analysis deriving the most common summary statistic formats is available at: https://al-murphy.github.io/SumstatFormats

Download Full-text

Metasubtract: an R-package to analytically produce leave-one-out meta-analysis GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btaa570 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4521-4522

Author(s):

Ilja M Nolte

Keyword(s):

Fixed Effects ◽

Validation Cohort ◽

Association Studies ◽

Meta Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Polygenic Scores ◽

Meta Analyses

Abstract Summary Summary statistics from a meta-analysis of genome-wide association studies (meta-GWAS) can be used for many follow-up analyses. One valuable application is the creation of polygenic scores. However, if polygenic scores are calculated in a validation cohort that was part of the meta-GWAS consortium, this cohort is not independent and analyses will therefore yield inflated results. The R package ‘MetaSubtract’ was developed to subtract the results of the validation cohort from meta-GWAS summary statistics analytically. The statistical formulas for a meta-analysis were inverted to compute corrected summary statistics of a meta-GWAS leaving one (or more) cohort(s) out. These formulas have been implemented in MetaSubtract for different meta-analyses methods (fixed effects inverse variance or square root sample size weighted z-score) accounting for no, single or double genomic control correction. Results obtained by MetaSubtract correlate very well to those calculated using the traditional way, i.e. by performing a meta-analysis leaving out the validation cohort. In conclusion, MetaSubtract allows researchers to compute meta-GWAS summary statistics that are independent of the GWAS results of the validation cohort without requiring access to the cohort level GWAS results of the corresponding meta-GWAS consortium. Availability and implementation https://cran.r-project.org/web/packages/MetaSubtract. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data

Bioinformatics ◽

10.1093/bioinformatics/btr679 ◽

2011 ◽

Vol 28 (3) ◽

pp. 444-445 ◽

Cited By ~ 21

Author(s):

Christian Fuchsberger ◽

Daniel Taliun ◽

Peter P. Pramstaller ◽

Cristian Pattaro

Keyword(s):

Quality Control ◽

Association Studies ◽

Meta Analysis ◽

Analysis Data ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Gene-based association tests using GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btz172 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3701-3708 ◽

Cited By ~ 6

Author(s):

Gulnara R Svishcheva ◽

Nadezhda M Belonogova ◽

Irina V Zorkoltseva ◽

Anatoly V Kirichenko ◽

Tatiana I Axenovich

Keyword(s):

Association Analysis ◽

Association Studies ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

New Material ◽

Artery Disease ◽

The Many ◽

Functional Linear Regression

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Gene-based analysis of ADHD using PASCAL: a biological insight into the novel associated genes

BMC Medical Genomics ◽

10.1186/s12920-019-0593-5 ◽

2019 ◽

Vol 12 (1) ◽

Author(s):

Aitana Alonso-Gonzalez ◽

Manuel Calaza ◽

Cristina Rodriguez-Fontenla ◽

Angel Carracedo

Keyword(s):

Gene Network ◽

Association Studies ◽

Meta Analysis ◽

Neurodevelopmental Disorder ◽

Differentially Expressed Gene ◽

Brain Regions ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Biological Insight ◽

Insight Into

Abstract Background Attention-Deficit Hyperactivity Disorder (ADHD) is a complex neurodevelopmental disorder (NDD) which may significantly impact on the affected individual’s life. ADHD is acknowledged to have a high heritability component (70–80%). Recently, a meta-analysis of GWAS (Genome Wide Association Studies) has demonstrated the association of several independent loci. Our main aim here, is to apply PASCAL (pathway scoring algorithm), a new gene-based analysis (GBA) method, to the summary statistics obtained in this meta-analysis. PASCAL will take into account the linkage disequilibrium (LD) across genomic regions in a different way than the most commonly employed GBA methods (MAGMA or VEGAS (Versatile Gene-based Association Study)). In addition to PASCAL analysis a gene network and an enrichment analysis for KEGG and GO terms were carried out. Moreover, GENE2FUNC tool was employed to create gene expression heatmaps and to carry out a (DEG) (Differentially Expressed Gene) analysis using GTEX v7 and BrainSpan data. Results PASCAL results have revealed the association of new loci with ADHD and it has also highlighted other genes previously reported by MAGMA analysis. PASCAL was able to discover new associations at a gene level for ADHD: FEZF1 (p-value: 2.2 × 10− 7) and FEZF1-AS1 (p-value: 4.58 × 10− 7). In addition, PASCAL has been able to highlight association of other genes that share the same LD block with some previously reported ADHD susceptibility genes. Gene network analysis has revealed several interactors with the associated ADHD genes and different GO and KEGG terms have been associated. In addition, GENE2FUNC has demonstrated the existence of several up and down regulated expression clusters when the associated genes and their interactors were considered. Conclusions PASCAL has been revealed as an efficient tool to extract additional information from previous GWAS using their summary statistics. This study has identified novel ADHD associated genes that were not previously reported when other GBA methods were employed. Moreover, a biological insight into the biological function of the ADHD associated genes across brain regions and neurodevelopmental stages is provided.

Download Full-text

metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Bioinformatics ◽

10.1093/bioinformatics/btw052 ◽

2016 ◽

Vol 32 (13) ◽

pp. 1981-1989 ◽

Cited By ~ 66

Author(s):

Anna Cichonska ◽

Juho Rousu ◽

Pekka Marttinen ◽

Antti J. Kangas ◽

Pasi Soininen ◽

...

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide

Download Full-text

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

Download Full-text

Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btw690 ◽

2016 ◽

pp. btw690 ◽

Cited By ~ 2

Author(s):

Wei Jiang ◽

Weichuan Yu

Keyword(s):

False Discovery Rate ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Local False Discovery Rate ◽

False Discovery ◽

Genome Wide

Download Full-text

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Nature Communications ◽

10.1038/s41467-019-12653-0 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 34

Author(s):

Luke R. Lloyd-Jones ◽

Jian Zeng ◽

Julia Sidorenko ◽

Loïc Yengo ◽

Gerhard Moser ◽

...

Keyword(s):

Multiple Regression ◽

Association Studies ◽

Meta Analysis ◽

Multiple Regression Model ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

The Uk

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.

Download Full-text

MultiMeta: an R package for meta-analysing multi-phenotype genome-wide association studies

10.1101/013920 ◽

2015 ◽

Author(s):

Dragana Vuckovic ◽

Paolo Gasparini ◽

Nicole Soranzo ◽

Valentina Iotchkova

Keyword(s):

Multivariate Analysis ◽

Association Studies ◽

Meta Analysis ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

New Methods ◽

Genome Wide ◽

Inverse Variance

Summary: As new methods for multivariate analysis of Genome Wide Association Studies (GWAS) become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance based method for meta-analysis, generalized to an n-dimensional setting. Availability: The R package MultiMeta can be downloaded from CRAN Contact: [email protected]

Download Full-text