High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes

Huang Xu; Xiang Li; Yaning Yang; Yi Li; Jose Pinheiro; Kate Sasser; Hisham Hamadeh; Xu Steven; Min Yuan;

doi:10.1093/bioinformatics/btaa120

High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes

Bioinformatics ◽

10.1093/bioinformatics/btaa120 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3004-3010

Author(s):

Huang Xu ◽

Xiang Li ◽

Yaning Yang ◽

Yi Li ◽

Jose Pinheiro ◽

...

Keyword(s):

High Throughput ◽

Association Studies ◽

Genomic Data ◽

Genome Wide Association ◽

Supplementary Information ◽

High Dimensional ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Multilocus Analysis ◽

Analytical Approaches

Abstract Motivation With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel ℓ0-norm algorithm. Results Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (≥6000 times). Availability and implementation The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Corrigendum of 'High throughput analysis of epistasis in genome-wide association studies with BiForce'

Bioinformatics ◽

10.1093/bioinformatics/btt444 ◽

2013 ◽

Vol 29 (20) ◽

pp. 2667-2668

Author(s):

A. Gyenesei ◽

C. A. M. Semple ◽

C. S. Haley ◽

W.-H. Wei

Keyword(s):

High Throughput ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Genome Wide

Download Full-text

The Intersection of Genome-Wide Association Studies and High-Throughput Small Interfering Ribonucleic Acid Screens Allows for the Identification of Novel Pathways Relevant to Atherosclerosis

JACC Basic to Translational Science ◽

10.1016/j.jacbts.2017.03.005 ◽

2017 ◽

Vol 2 (2) ◽

pp. 209-211

Author(s):

Vivek Nanda ◽

Sophia Xiao ◽

Jianqin Ye ◽

Nicholas J. Leeper

Keyword(s):

High Throughput ◽

Ribonucleic Acid ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Generalization of Cortical Multivariate Genome-Wide Associations Within and Across Samples

10.1101/2021.04.23.441215 ◽

2021 ◽

Author(s):

Robert J. Loughnan ◽

Alexey A. Shadrin ◽

Oleksandr Frei ◽

Dennis van der Mer ◽

Weiqi Zhao ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

High Dimensional ◽

Multivariate Techniques ◽

Genome Wide Association Studies ◽

Independent Data ◽

Uk Biobank ◽

Replication Rate ◽

Multiple Phenotypes ◽

Genome Wide

AbstractGenome-Wide Association studies have typically been limited to single phenotypes, given that high dimensional phenotypes incur a large multiple comparisons burden: ~1 million tests across the genome times the number of phenotypes. Recent work demonstrates that a Multivariate Omnibus Statistic Test (MOSTest) is well powered to discover genomic effects distributed across multiple phenotypes. Applied to cortical brain MRI morphology measures, MOSTest has resulted in a drastic improvement in power to discover loci – a 10-fold increase in discovered loci compared to established approaches (min-P). One question that arises is how well these discovered loci replicate in independent data. Here we perform 10 -imes cross validation within 35,644 individuals from UK Biobank for imaging measures of cortical area, thickness and sulcal depth (>1,000 dimensionality for each). By deploying a replication method that aggregates discovered effects distributed across multiple phenotypes, termed PolyVertex Score (PVS), we demonstrate a higher replication yield and comparable replication rate of discovered loci for MOSTest (# replicated loci: 428-1,037, replication rate: 95-96%) in independent data when compared with the established min-P approach (# replicated loci: 30-71, replication rate: 70-84%). An out-of-sample generalization of discovered loci was conducted with a sample of 8,336 individuals from the Adolescent Brain Cognitive Development® (ABCD) study, who are on average 50 years younger than UK Biobank individuals. We observe a higher replication yield and comparable replication rate of MOSTest compared to min-P. This finding underscores the importance of using multivariate techniques for both discovery and replication of high dimensional phenotypes in Genome-Wide Association studies.

Download Full-text

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects

Bioinformatics ◽

10.1093/bioinformatics/btz017 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3046-3054 ◽

Cited By ~ 2

Author(s):

Anastasia Gurinovich ◽

Harold Bae ◽

John J Farrell ◽

Stacy L Andersen ◽

Stefano Monti ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

False Positive Rate ◽

Principal Component ◽

True Positive Rate ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient multivariate analysis algorithms for longitudinal genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz304 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4879-4885 ◽

Cited By ~ 4

Author(s):

Chao Ning ◽

Dan Wang ◽

Lei Zhou ◽

Julong Wei ◽

Yuanxin Liu ◽

...

Keyword(s):

Longitudinal Data ◽

Software Package ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Computational Speed

Abstract Motivation Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. Results We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. Availability and implementation A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

High-throughput analysis of epistasis in genome-wide association studies with BiForce

Bioinformatics ◽

10.1093/bioinformatics/bts304 ◽

2012 ◽

Vol 28 (15) ◽

pp. 1957-1964 ◽

Cited By ~ 37

Author(s):

Attila Gyenesei ◽

Jonathan Moody ◽

Colin A.M. Semple ◽

Chris S. Haley ◽

Wen-Hua Wei

Keyword(s):

High Throughput ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Genome Wide

Download Full-text

Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2014.06.001 ◽

2014 ◽

Vol 95 (1) ◽

pp. 126 ◽

Cited By ~ 3

Author(s):

Joseph K. Pickrell

Keyword(s):

Association Studies ◽

Genomic Data ◽

Genome Wide Association ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Functional Genomic ◽

Functional Genomic Data ◽

Genome Wide

Download Full-text