scholarly journals HiGwas: how to compute longitudinal GWAS data in population designs

2020 ◽  
Vol 36 (14) ◽  
pp. 4222-4224
Author(s):  
Zhong Wang ◽  
Nating Wang ◽  
Zilu Wang ◽  
Libo Jiang ◽  
Yaqun Wang ◽  
...  

Abstract Summary Genome-wide association studies (GWAS), particularly designed with thousands and thousands of single-nucleotide polymorphisms (SNPs) (big p) genotyped on tens of thousands of subjects (small n), are encountered by a major challenge of p ≪ n. Although the integration of longitudinal information can significantly enhance a GWAS’s power to comprehend the genetic architecture of complex traits and diseases, an additional challenge is generated by an autocorrelative process. We have developed several statistical models for addressing these two challenges by implementing dimension reduction methods and longitudinal data analysis. To make these models computationally accessible to applied geneticists, we wrote an R package of computer software, HiGwas, designed to analyze longitudinal GWAS datasets. Functions in the package encompass single SNP analyses, significance-level adjustment, preconditioning and model selection for a high-dimensional set of SNPs. HiGwas provides the estimates of genetic parameters and the confidence intervals of these estimates. We demonstrate the features of HiGwas through real data analysis and vignette document in the package. Availability and implementation https://github.com/wzhy2000/higwas. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Xiaofeng Zhu ◽  
Xiaoyin Li ◽  
Rong Xu ◽  
Tao Wang

Abstract Motivation The overall association evidence of a genetic variant with multiple traits can be evaluated by cross-phenotype association analysis using summary statistics from genome-wide association studies. Further dissecting the association pathways from a variant to multiple traits is important to understand the biological causal relationships among complex traits. Results Here, we introduce a flexible and computationally efficient Iterative Mendelian Randomization and Pleiotropy (IMRP) approach to simultaneously search for horizontal pleiotropic variants and estimate causal effect. Extensive simulations and real data applications suggest that IMRP has similar or better performance than existing Mendelian Randomization methods for both causal effect estimation and pleiotropic variant detection. The developed pleiotropy test is further extended to detect colocalization for multiple variants at a locus. IMRP will greatly facilitate our understanding of causal relationships underlying complex traits, in particular, when a large number of genetic instrumental variables are used for evaluating multiple traits. Availability and implementation The software IMRP is available at https://github.com/XiaofengZhuCase/IMRP. The simulation codes can be downloaded at http://hal.case.edu/∼xxz10/zhu-web/ under the link: MR Simulations software. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (9) ◽  
pp. 2763-2769
Author(s):  
Jie-Huei Wang ◽  
Yi-Hau Chen

Abstract Motivation In gene expression and genome-wide association studies, the identification of interaction effects is an important and challenging issue owing to its ultrahigh-dimensional nature. In particular, contaminated data and right-censored survival outcome make the associated feature screening even challenging. Results In this article, we propose an inverse probability-of-censoring weighted Kendall’s tau statistic to measure association of a survival trait with biomarkers, as well as a Kendall’s partial correlation statistic to measure the relationship of a survival trait with an interaction variable conditional on the main effects. The Kendall’s partial correlation is then used to conduct interaction screening. Simulation studies under various scenarios are performed to compare the performance of our proposal with some commonly available methods. In the real data application, we utilize our proposed method to identify epistasis associated with the clinical survival outcomes of non-small-cell lung cancer, diffuse large B-cell lymphoma and lung adenocarcinoma patients. Both simulation and real data studies demonstrate that our method performs well and outperforms existing methods in identifying main and interaction biomarkers. Availability and implementation R-package ‘IPCWK’ is available to implement this method, together with a reference manual describing how to perform the ‘IPCWK’ package. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2021 ◽  
Author(s):  
Tarek Souaid ◽  
Joya-Rita Hindy ◽  
Ernest Diab ◽  
Hampig Raphael Kourie

Bladder cancer (BC) is the most common cancer involving the urinary system and the ninth most common cancer worldwide. Tobacco smoking is the most important environmental risk factor of BC. Several single nucleotide polymorphisms have been validated by genome-wide association studies as genetic risk factors for BC. However, the identification of DNA mismatch-repair genes, including MSH2 in Lynch syndrome and MUTYH in MUTYH-associated polyposis, raises the possibility of monogenic hereditary forms of BC. Moreover, other genetic mutations may play a key role in familial and hereditary transmissions of BC. Therefore, the aim of this review is to focus on the major hereditary syndromes involved in the development of BC and to report BC genetic susceptibilities established with genome-wide significance level.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i427-i435 ◽  
Author(s):  
Héctor Climente-González ◽  
Chloé-Agathe Azencott ◽  
Samuel Kaski ◽  
Makoto Yamada

AbstractMotivationFinding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks.ResultsWe compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.Availability and implementationBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso).Supplementary informationSupplementary data are available at Bioinformatics online.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Fernando P. Guerra ◽  
Haktan Suren ◽  
Jason Holliday ◽  
James H. Richards ◽  
Oliver Fiehn ◽  
...  

Abstract Background Populus trichocarpa is an important forest tree species for the generation of lignocellulosic ethanol. Understanding the genomic basis of biomass production and chemical composition of wood is fundamental in supporting genetic improvement programs. Considerable variation has been observed in this species for complex traits related to growth, phenology, ecophysiology and wood chemistry. Those traits are influenced by both polygenic control and environmental effects, and their genome architecture and regulation are only partially understood. Genome wide association studies (GWAS) represent an approach to advance that aim using thousands of single nucleotide polymorphisms (SNPs). Genotyping using exome capture methodologies represent an efficient approach to identify specific functional regions of genomes underlying phenotypic variation. Results We identified 813 K SNPs, which were utilized for genotyping 461 P. trichocarpa clones, representing 101 provenances collected from Oregon and Washington, and established in California. A GWAS performed on 20 traits, considering single SNP-marker tests identified a variable number of significant SNPs (p-value < 6.1479E-8) in association with diameter, height, leaf carbon and nitrogen contents, and δ15N. The number of significant SNPs ranged from 2 to 220 per trait. Additionally, multiple-marker analyses by sliding-windows tests detected between 6 and 192 significant windows for the analyzed traits. The significant SNPs resided within genes that encode proteins belonging to different functional classes as such protein synthesis, energy/metabolism and DNA/RNA metabolism, among others. Conclusions SNP-markers within genes associated with traits of importance for biomass production were detected. They contribute to characterize the genomic architecture of P. trichocarpa biomass required to support the development and application of marker breeding technologies.


2019 ◽  
Vol 35 (19) ◽  
pp. 3701-3708 ◽  
Author(s):  
Gulnara R Svishcheva ◽  
Nadezhda M Belonogova ◽  
Irina V Zorkoltseva ◽  
Anatoly V Kirichenko ◽  
Tatiana I Axenovich

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 75 (4) ◽  
pp. 652-659 ◽  
Author(s):  
Hirotaka Matsuo ◽  
Ken Yamamoto ◽  
Hirofumi Nakaoka ◽  
Akiyoshi Nakayama ◽  
Masayuki Sakiyama ◽  
...  

ObjectiveGout, caused by hyperuricaemia, is a multifactorial disease. Although genome-wide association studies (GWASs) of gout have been reported, they included self-reported gout cases in which clinical information was insufficient. Therefore, the relationship between genetic variation and clinical subtypes of gout remains unclear. Here, we first performed a GWAS of clinically defined gout cases only.MethodsA GWAS was conducted with 945 patients with clinically defined gout and 1213 controls in a Japanese male population, followed by replication study of 1048 clinically defined cases and 1334 controls.ResultsFive gout susceptibility loci were identified at the genome-wide significance level (p<5.0×10−8), which contained well-known urate transporter genes (ABCG2 and SLC2A9) and additional genes: rs1260326 (p=1.9×10−12; OR=1.36) of GCKR (a gene for glucose and lipid metabolism), rs2188380 (p=1.6×10−23; OR=1.75) of MYL2-CUX2 (genes associated with cholesterol and diabetes mellitus) and rs4073582 (p=6.4×10−9; OR=1.66) of CNIH-2 (a gene for regulation of glutamate signalling). The latter two are identified as novel gout loci. Furthermore, among the identified single-nucleotide polymorphisms (SNPs), we demonstrated that the SNPs of ABCG2 and SLC2A9 were differentially associated with types of gout and clinical parameters underlying specific subtypes (renal underexcretion type and renal overload type). The effect of the risk allele of each SNP on clinical parameters showed significant linear relationships with the ratio of the case–control ORs for two distinct types of gout (r=0.96 [p=4.8×10−4] for urate clearance and r=0.96 [p=5.0×10−4] for urinary urate excretion).ConclusionsOur findings provide clues to better understand the pathogenesis of gout and will be useful for development of companion diagnostics.


Sign in / Sign up

Export Citation Format

Share Document