Bayesian copy number detection and association in large-scale studies

Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Bayesian Regression Model with Variable Selection for Genome-Wide Association Studies

Case Studies in Bayesian Statistical Modelling and Analysis - Wiley Series in Probability and Statistics ◽

10.1002/9781118394472.ch6 ◽

2012 ◽

pp. 103-117

Author(s):

Carla Chen ◽

Kerrie L. Mengersen ◽

Katja Ickstadt ◽

Jonathan M. Keith

Keyword(s):

Variable Selection ◽

Regression Model ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Selection For

Download Full-text

Genome-Wide Association Studies Reveal Susceptibility Loci for Digital Dermatitis in Holstein Cattle

Animals ◽

10.3390/ani10112009 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2009

Author(s):

Ellen Lai ◽

Alexa L. Danner ◽

Thomas R. Famula ◽

Anita M. Oberbauer

Keyword(s):

Predictive Value ◽

Mixed Model ◽

Linear Mixed Model ◽

Bos Taurus ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Digital Dermatitis ◽

Genome Wide

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

A critical evaluation of results from genome-wide association studies of micronutrient status and their utility in the practice of precision nutrition

British Journal Of Nutrition ◽

10.1017/s0007114519001119 ◽

2019 ◽

Vol 122 (2) ◽

pp. 121-130 ◽

Cited By ~ 2

Author(s):

Marie-Joe Dib ◽

Ruan Elliott ◽

Kourosh R. Ahmadi

Keyword(s):

Large Scale ◽

Association Studies ◽

Critical Evaluation ◽

Water Soluble ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Micronutrient Deficiencies ◽

Micronutrient Status ◽

Genome Wide ◽

Fat Soluble Vitamins

AbstractRapid advances in ‘omics’ technologies have paved the way forward to an era where more ‘precise’ approaches – ‘precision’ nutrition – which leverage data on genetic variability alongside the traditional indices, have been put forth as the state-of-the-art solution to redress the effects of malnutrition across the life course. We purport that this inference is premature and that it is imperative to first review and critique the existing evidence from large-scale epidemiological findings. We set out to provide a critical evaluation of findings from genome-wide association studies (GWAS) in the roadmap to precision nutrition, focusing on GWAS of micronutrient disposition. We found that a large number of loci associated with biomarkers of micronutrient status have been identified. Mean estimates of heritability of micronutrient status ranged between 20 and 35 % for minerals, 56–59 % for water-soluble and 30–70 % for fat-soluble vitamins. With some exceptions, the majority of the identified genetic variants explained little of the overall variance in status for each micronutrient, ranging between 1·3 and 8 % (minerals), <0·1–12 % (water-soluble) and 1·7–2·3 % for (fat-soluble) vitamins. However, GWAS have provided some novel insight into mechanisms that underpin variability in micronutrient status. Our findings highlight obvious gaps that need to be addressed if the full scope of precision nutrition is ever to be realised, including research aimed at (i) dissecting the genetic basis of micronutrient deficiencies or ‘response’ to intake/supplementation (ii) identifying trans-ethnic and ethnic-specific effects (iii) identifying gene–nutrient interactions for the purpose of unravelling molecular ‘behaviour’ in a range of environmental contexts.

Download Full-text

Copy Number Variants and Common Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies

PLoS Genetics ◽

10.1371/journal.pgen.0030190 ◽

2007 ◽

Vol 3 (10) ◽

pp. e190 ◽

Cited By ~ 139

Author(s):

Xavier Estivill ◽

Lluís Armengol

Keyword(s):

Copy Number ◽

Association Studies ◽

Copy Number Variants ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Currently Available Versions of Genome-Wide Association Studies Cannot Be Used to Query the Common Haptoglobin Copy Number Variant

Journal of the American College of Cardiology ◽

10.1016/j.jacc.2013.04.079 ◽

2013 ◽

Vol 62 (9) ◽

pp. 860-861 ◽

Cited By ~ 12

Author(s):

Leah E. Cahill ◽

Majken K. Jensen ◽

Daniel I. Chasman ◽

Aditi Hazra ◽

Andrew P. Levy ◽

...

Keyword(s):

Copy Number ◽

Association Studies ◽

Copy Number Variant ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

The Common

Download Full-text

Large-Scale Development of Gene-Associated Single-Nucleotide Polymorphism Markers for Molluscan Population Genomic, Comparative Genomic, and Genome-Wide Association Studies

DNA Research ◽

10.1093/dnares/dst048 ◽

2013 ◽

Vol 21 (2) ◽

pp. 183-193 ◽

Cited By ~ 10

Author(s):

W. Jiao ◽

X. Fu ◽

J. Li ◽

L. Li ◽

L. Feng ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Comparative Genomic ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Population Genomic ◽

Genome Wide

Download Full-text

Progress in the genetics of common obesity and type 2 diabetes

Expert Reviews in Molecular Medicine ◽

10.1017/s1462399410001389 ◽

2010 ◽

Vol 12 ◽

Cited By ~ 64

Author(s):

Karani S. Vimaleswaran ◽

Ruth J.F. Loos

Keyword(s):

Type 2 Diabetes ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Epidemiological Methods ◽

Genome Wide Association Studies ◽

Obesity And Diabetes ◽

Genome Wide ◽

Genomics And Proteomics

The prevalence of obesity and diabetes, which are heritable traits that arise from the interactions of multiple genes and lifestyle factors, continues to rise worldwide, causing serious health problems and imposing a substantial economic burden on societies. For the past 15 years, candidate gene and genome-wide linkage studies have been the main genetic epidemiological approaches to identify genetic loci for obesity and diabetes, yet progress has been slow and success limited. The genome-wide association approach, which has become available in recent years, has dramatically changed the pace of gene discoveries. Genome-wide association is a hypothesis-generating approach that aims to identify new loci associated with the disease or trait of interest. So far, three waves of large-scale genome-wide association studies have identified 19 loci for common obesity and 18 for common type 2 diabetes. Although the combined contribution of these loci to the variation in obesity and diabetes risk is small and their predictive value is typically low, these recently identified loci are set to substantially improve our insights into the pathophysiology of obesity and diabetes. This will require integration of genetic epidemiological methods with functional genomics and proteomics. However, the use of these novel insights for genetic screening and personalised treatment lies some way off in the future.

Download Full-text

Large-scale Multi-omics Genome-wide Association Studies (Mo-GWAS): Guidelines for Sample Preparation and Normalization

Journal of Visualized Experiments ◽

10.3791/62732 ◽

2021 ◽

Author(s):

Mustafa Bulut ◽

Alisdair R. Fernie ◽

Saleh Alseekh

Keyword(s):

Sample Preparation ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text