martini: an R package for genome-wide association studies using SNP networks

Sources Of Information ◽

Link Type ◽

New Genes ◽

AbstractSystems biology shows that genes related to the same phenotype are often functionally related. We can take advantage of this to discover new genes that affect a phenotype. However, the natural unit of analysis in genome-wide association studies (GWAS) is not the gene, but the single nucleotide polymorphism, or SNP. We introduce martini, an R package to build SNP co-function networks and use them to conduct GWAS. In SNP networks, two SNPs are connected if there is evidence they jointly contribute to the same biological function. By leveraging such information in GWAS, we search SNPs that are not only strongly associated with a phenotype, but also functionally related. This, in turn, boosts discovery and interpretability. Martini builds such networks using three sources of information: genomic position, gene annotations, and gene-gene interactions. The resulting SNP networks involve hundreds of thousands of nodes and millions of edges, making their exploration computationally intensive. Martini implements two network-guided biomarker discovery algorithms based on graph cuts that can handle such large networks: SConES and SigMod. They both seek a small subset of SNPs with high association scores with the phenotype of interest and densely interconnected in the network. Both algorithms use parameters that control the relative importance of the SNPs’ association scores, the number of SNPs selected, and their interconnection. Martini includes a cross-validation procedure to set these parameters automatically. Lastly, martini includes tools to visualize the selected SNPs’ network and association properties. Martini is available on GitHub (hclimente/martini) and Bioconductor (martini).

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Qtlizer: comprehensive QTL annotation of GWAS results

Scientific Reports ◽

10.1038/s41598-020-75770-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Matthias Munz ◽

Inken Wohlers ◽

Eric Simon ◽

Tobias Reinberger ◽

Hauke Busch ◽

...

Keyword(s):

Association Studies ◽

Housekeeping Genes ◽

R Package ◽

Protein Abundance ◽

Base Pairs ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Distance Limit

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).

What is Next for the Genetics of Multiple Sclerosis?

Autoimmune Diseases ◽

10.4061/2011/519450 ◽

2011 ◽

Vol 2011 ◽

pp. 1-3 ◽

Cited By ~ 4

Author(s):

Sreeram V. Ramagopalan ◽

David A. Dyment

Keyword(s):

Multiple Sclerosis ◽

Genetic Risk ◽

Neurological Disease ◽

Association Studies ◽

Genome Wide Association ◽

New Genes ◽

Histocompatibility Complex ◽

Genome Wide ◽

The Common

We review here our current understanding of the genetic aetiology of the common complex neurological disease multiple sclerosis (MS). The strongest genetic risk factor for MS is the major histocompatibility complex which was identified in the 1970s. In 2011, after a number of genome-wide association studies have been completed and have identified approximately 20 new genes for MS, we ask the question—what is next for the genetics of MS?

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

CVRMS: Cross-validated Rank-based Marker Selection for Genome-wide Prediction of Low Heritability

10.1101/756130 ◽

2019 ◽

Author(s):

Seongmun Jeong ◽

Jae-Yoon Kim ◽

Namshin Kim

Keyword(s):

Genetic Information ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Marker Selection ◽

Genome Wide ◽

Precise Prediction ◽

Human Animal ◽

Selection For

AbstractCVRMS is an R package designed to extract marker subsets from repeated rank-based marker datasets generated from genome-wide association studies or marker effects for genome-wide prediction (https://github.com/lovemun/CVRMS). CVRMS provides an optimized genome-wide biomarker set with the best predictability of phenotype by implemented ridge regression using genetic information. Applying our method to human, animal, and plant datasets with wide heritability (zero to one), we selected hundreds to thousands of biomarkers for precise prediction.

Genome-wide association study identifies novel type II diabetes risk loci in Jordan subpopulations

PeerJ ◽

10.7717/peerj.3618 ◽

2017 ◽

Vol 5 ◽

pp. e3618 ◽

Cited By ~ 4

Author(s):

Rana Dajani ◽

Jin Li ◽

Zhi Wei ◽

Michael E. March ◽

Qianghua Xia ◽

...

Keyword(s):

Type Ii Diabetes ◽

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

Type Ii ◽

Arab Population ◽

Public Health Burden ◽

Link Type ◽

The prevalence of Type II Diabetes (T2D) has been increasing and has become a disease of significant public health burden in Jordan. None of the previous genome-wide association studies (GWAS) have specifically investigated the Middle East populations. The Circassian and Chechen communities in Jordan represent unique populations that are genetically distinct from the Arab population and other populations in the Caucasus. Prevalence of T2D is very high in both the Circassian and Chechen communities in Jordan despite low obesity prevalence. We conducted GWAS on T2D in these two populations and further performed meta-analysis of the results. We identified a novel T2D locus at chr20p12.2 at genome-wide significance (rs6134031, P = 1.12 × 10−8) and we replicated the results in the Wellcome Trust Case Control Consortium (WTCCC) dataset. Another locus at chr12q24.31 is associated with T2D at suggestive significance level (top SNP rs4758690, P = 4.20 × 10−5) and it is a robust eQTL for the gene, MLXIP (P = 1.10 × 10−14), and is significantly associated with methylation level in MLXIP, the functions of which involves cellular glucose response. Therefore, in this first GWAS of T2D in Jordan subpopulations, we identified novel and unique susceptibility loci which may help inform the genetic underpinnings of T2D in other populations.

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Integration of genome-wide association studies and gene coexpression networks unveils promising soybean resistance genes against five common fungal pathogens

Scientific Reports ◽

10.1038/s41598-021-03864-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fabricio Almeida-Silva ◽

Thiago M. Venancio

Keyword(s):

Candidate Genes ◽

Resistance Genes ◽

Association Studies ◽

Fungal Species ◽

Genome Wide Association ◽

Economic Losses ◽

Link Type ◽

Physical Defense ◽

AbstractSoybean is one of the most important legume crops worldwide. However, soybean yield is dramatically affected by fungal diseases, leading to economic losses of billions of dollars yearly. Here, we integrated publicly available genome-wide association studies and transcriptomic data to prioritize candidate genes associated with resistance to Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi. We identified 188, 56, 11, 8, and 3 high-confidence candidates for resistance to F. virguliforme, F. graminearum, C. gregata, M. phaseolina and P. pachyrhizi, respectively. The prioritized candidate genes are highly conserved in the pangenome of cultivated soybeans and are heavily biased towards fungal species-specific defense responses. The vast majority of the prioritized candidate resistance genes are related to plant immunity processes, such as recognition, signaling, oxidative stress, systemic acquired resistance, and physical defense. Based on the number of resistance alleles, we selected the five most resistant accessions against each fungal species in the soybean USDA germplasm. Interestingly, the most resistant accessions do not reach the maximum theoretical resistance potential. Hence, they can be further improved to increase resistance in breeding programs or through genetic engineering. Finally, the coexpression network generated here is available in a user-friendly web application (https://soyfungigcn.venanciogroup.uenf.br/) and an R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN) that serve as a public resource to explore soybean-pathogenic fungi interactions at the transcriptional level.

Mixed logistic regression in genome-wide association studies

BMC Bioinformatics ◽

10.1186/s12859-020-03862-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacqueline Milet ◽

David Courtin ◽

André Garcia ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Cox Model ◽

Association Studies ◽

Scale Up ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

Genome-Wide Association Studies: Progress in Identifying Genetic Biomarkers in Common, Complex Diseases

Biomarker Insights ◽

10.1177/117727190700200019 ◽

2007 ◽

Vol 2 ◽

pp. 117727190700200 ◽

Cited By ~ 17

Author(s):

Stephen F. Kingsmore ◽

Ingrid E. Lindquist ◽

Joann Mudge ◽

William D. Beavis

Keyword(s):

Genetic Heterogeneity ◽

Complex Traits ◽

Biomarker Discovery ◽

Association Studies ◽

Complex Diseases ◽

Human Diseases ◽

Genome Wide Association ◽

Genetic Biomarkers ◽

Novel, comprehensive approaches for biomarker discovery and validation are urgently needed. One particular area of methodologic need is for discovery of novel genetic biomarkers in complex diseases and traits. Here, we review recent successes in the use of genome wide association (GWA) approaches to identify genetic biomarkers in common human diseases and traits. Such studies are yielding initial insights into the allelic architecture of complex traits. In general, it appears that complex diseases are associated with many common polymorphisms, implying profound genetic heterogeneity between affected individuals.