vSampler: fast and annotation-based matched variant sampling tool

Author(s):  
Dandan Huang ◽  
Zhao Wang ◽  
Yao Zhou ◽  
Qian Liang ◽  
Pak Chung Sham ◽  
...  

Abstract Summary Sampling of control variants having matched properties with input variants is widely used in enrichment analysis of genome-wide association studies/quantitative trait loci and negative data construction for pathogenic/regulatory variant prediction methods. Spurious enrichment results because of confounding factors, such as minor allele frequency and linkage disequilibrium pattern, can be avoided by calibration of statistical significance based on matched controls. Here, we presented vSampler which can generate sets of randomly drawn variants with comprehensive choices of matching properties, such as tissue/cell type-specific epigenomic features. Importantly, the development of a novel data structure and sampling algorithms for vSampler makes it significantly fast than existing tools. Availability and implementation vSampler web server and local program are available at http://mulinlab.org/vsampler. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Hiroshi Matsunaga ◽  
Kaoru Ito ◽  
Masato Akiyama ◽  
Atsushi Takahashi ◽  
Satoshi Koyama ◽  
...  

AbstractBackgroundGenome-wide association studies (GWAS) provided many biological insights into coronary artery disease (CAD), but these studies were mainly performed in Europeans. GWAS in diverse populations have the potential to advance our understanding of CAD.Methods and ResultsWe conducted two GWAS for CAD in the Japanese population, which included 12,494 cases and 28,879 controls, and 2,808 cases and 7,261 controls, respectively. Then, we performed transethnic meta-analysis using the results of the CARDIoGRAMplusC4D 1000 Genomes meta-analysis with UK Biobank. We identified 3 new loci on chromosome 1q21 (CTSS), 10q26 (WDR11-FGFR2), and 11q22 (RDX-FDX1). Quantitative trait locus analyses suggested the association of CTSS and RDX-FDX1 with atherosclerotic immune cells. Tissue/cell type enrichment analysis showed the involvement of arteries, adrenal glands and fat tissues in the development of CAD. Finally, we performed tissue/cell type enrichment analysis using East Asian-frequent and European-frequent variants according to the risk allele frequencies, and identified significant enrichment of adrenal glands in the East Asian-frequent group while the enrichment of arteries and fat tissues was found in the European-frequent group. These findings indicate biological differences in CAD susceptibility between Japanese and Europeans.ConclusionsWe identified 3 new loci for CAD and highlighted the genetic differences between the Japanese and European populations. Moreover, our transethnic analyses showed both shared and unique genetic architectures between the Japanese and Europeans. While most of the underlying genetic bases for CAD are shared, further analyses in diverse populations will be needed to elucidate variations fully.


2019 ◽  
Vol 35 (17) ◽  
pp. 3154-3156 ◽  
Author(s):  
Oskari Timonen ◽  
Mikko Särkkä ◽  
Tibor Fülöp ◽  
Anton Mattsson ◽  
Juha Kekäläinen ◽  
...  

Abstract Summary Genome-wide association studies (GWAS) aim to identify associations of genetic variations such as single-nucleotide polymorphisms (SNPs) to a specific trait or a disease. Identifying common themes such as pathways, biological processes and diseases associations is needed to further explore and interpret these results. Varanto is a novel web tool for annotating, visualizing and analyzing human genetic variations using diverse data sources. Varanto can be used to query a set of input variations, retrieve their associated variation and gene level annotations, perform annotation enrichment analysis and visualize the results. Availability and implementation Varanto web app is developed with R and implemented as Shiny app with PostgreSQL database and is freely available at http://bioinformatics.uef.fi/varanto. Source code for the tool is available at https://github.com/oqe/varanto. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3842-3845 ◽  
Author(s):  
Guangsheng Pei ◽  
Yulin Dai ◽  
Zhongming Zhao ◽  
Peilin Jia

Abstract Motivation Diseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles. Results We present deTS, an R package, to conduct tissue-specific enrichment analysis with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association studies data and cancer expression data showed that deTS could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies. Availability and implementation https://github.com/bsml320/deTS and CRAN https://cran.r-project.org/web/packages/deTS/ Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
pp. 1-11
Author(s):  
Valentina Escott-Price ◽  
Karl Michael Schmidt

<b><i>Background:</i></b> Genome-wide association studies (GWAS) were successful in identifying SNPs showing association with disease, but their individual effect sizes are small and require large sample sizes to achieve statistical significance. Methods of post-GWAS analysis, including gene-based, gene-set and polygenic risk scores, combine the SNP effect sizes in an attempt to boost the power of the analyses. To avoid giving undue weight to SNPs in linkage disequilibrium (LD), the LD needs to be taken into account in these analyses. <b><i>Objectives:</i></b> We review methods that attempt to adjust the effect sizes (β<i>-</i>coefficients) of summary statistics, instead of simple LD pruning. <b><i>Methods:</i></b> We subject LD adjustment approaches to a mathematical analysis, recognising Tikhonov regularisation as a framework for comparison. <b><i>Results:</i></b> Observing the similarity of the processes involved with the more straightforward Tikhonov-regularised ordinary least squares estimate for multivariate regression coefficients, we note that current methods based on a Bayesian model for the effect sizes effectively provide an implicit choice of the regularisation parameter, which is convenient, but at the price of reduced transparency and, especially in smaller LD blocks, a risk of incomplete LD correction. <b><i>Conclusions:</i></b> There is no simple answer to the question which method is best, but where interpretability of the LD adjustment is essential, as in research aiming at identifying the genomic aetiology of disorders, our study suggests that a more direct choice of mild regularisation in the correction of effect sizes may be preferable.


2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
David M. Howard ◽  
Mark J. Adams ◽  
Toni-Kim Clarke ◽  
Jonathan D. Hafferty ◽  
Jude Gibson ◽  
...  

AbstractMajor depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (18) ◽  
pp. 4749-4756 ◽  
Author(s):  
Alexey A Shadrin ◽  
Oleksandr Frei ◽  
Olav B Smeland ◽  
Francesco Bettella ◽  
Kevin S O'Connell ◽  
...  

Abstract Motivation Determining the relative contributions of functional genetic categories is fundamental to understanding the genetic etiology of complex human traits and diseases. Here, we present Annotation Informed-MiXeR, a likelihood-based method for estimating the number of variants influencing a phenotype and their effect sizes across different functional annotation categories of the genome using summary statistics from genome-wide association studies. Results Extensive simulations demonstrate that the model is valid for a broad range of genetic architectures. The model suggests that complex human phenotypes substantially differ in the number of causal variants, their localization in the genome and their effect sizes. Specifically, the exons of protein-coding genes harbor more than 90% of variants influencing type 2 diabetes and inflammatory bowel disease, making them good candidates for whole-exome studies. In contrast, &lt;10% of the causal variants for schizophrenia, bipolar disorder and attention-deficit/hyperactivity disorder are located in protein-coding exons, indicating a more substantial role of regulatory mechanisms in the pathogenesis of these disorders. Availability and implementation The software is available at: https://github.com/precimed/mixer. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document