301 Methods of genome-wide association studies and their applications in dairy cattle

Abstract Genome-wide association studies (GWAS) has been widely used to map quantitative trait loci (QTL) of complex traits and diseases since 2007. To date, the human GWAS catalog has accumulated 4,410 publications and 172,351 associations, and the animal QTLdb has curated 983 publications and 130,407 QTLs for cattle, largest in livestock species. During the past 13 years of development, GWAS methods has evolved from simple linear regression, using principal components to address sample relatedness, mixed models, to Bayesian full model approaches. These methods have their advantages and limitations, so it is important to choose an appropriate method, especially for studies in livestock where sample size is often limited. Note that the most popular GWAS approach, the mixed model method, originated from animal breeding and genetics research. Leveraging the national cattle genomic database at the Council on Dairy Cattle Breeding (CDCB), we have conducted GWAS analyses of various dairy traits to identify QTLs and SNP markers of importance. Combining with sequence and functional annotation data, we seek to understand the genetic basis of complex traits and to reveal useful knowledge that can be incorporated into more accurate genomic predictions in the future.

Download Full-text

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

Download Full-text

Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies

BMC Bioinformatics ◽

10.1186/s12859-019-3300-9 ◽

2019 ◽

Vol 20 (S23) ◽

Cited By ~ 1

Author(s):

Haohan Wang ◽

Tianwei Yue ◽

Jingkang Yang ◽

Wei Wu ◽

Eric P. Xing

Keyword(s):

Neural Network ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Complex Traits ◽

Population Stratification ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. Results In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. Conclusions After validating the performance of our method using simulation experiments, we further apply it to Alzheimer’s disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer’s disease.

Download Full-text

Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle

Animals ◽

10.3390/ani11020541 ◽

2021 ◽

Vol 11 (2) ◽

pp. 541

Author(s):

Long Chen ◽

Jennie E. Pryce ◽

Ben J. Hayes ◽

Hans D. Daetwyler

Keyword(s):

Dairy Cattle ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.

Download Full-text

A new approach of dissecting genetic effects for complex traits

10.1101/2020.10.16.336180 ◽

2020 ◽

Cited By ~ 1

Author(s):

Meng Luo ◽

Shiliang Gu

Keyword(s):

Population Structure ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Computationally Efficient ◽

New Approach ◽

Genome Wide ◽

Outbred Mice

AbstractDuring the past decades, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits included in humans, animals, and plants. All common genome-wide association (GWA) methods rely on population structure correction to avoid false genotype and phenotype associations. However, population structure correction is a stringent penalization, which also impedes the identification of real associations. Here, we used recent statistical advances and proposed iterative screen regression (ISR), which enables simultaneous multiple marker associations and shown to appropriately correction population stratification and cryptic relatedness in GWAS. Results from analyses of simulated suggest that the proposed ISR method performed well in terms of power (sensitivity) versus FDR (False Discovery Rate) and specificity, also less bias (higher accuracy) in effect (PVE) estimation than the existing multi-loci (mixed) model and the single-locus (mixed) model. We also show the practicality of our approach by applying it to rice, outbred mice, and A.thaliana datasets. It identified several new causal loci that other methods did not detect. Our ISR provides an alternative for multi-loci GWAS, and the implementation was computationally efficient, analyzing large datasets practicable (n>100,000).

Download Full-text

Faculty Opinions recommendation of Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733803377.793550136 ◽

2018 ◽

Author(s):

Mohan Liu

Keyword(s):

Effect Size ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Complex Effect ◽

Genome Wide ◽

Level Statistics

Download Full-text

Genome wide association analyses to understand genetic basis of flowering and plant height under three levels of nitrogen application in Brassica juncea (L.) Czern & Coss

Scientific Reports ◽

10.1038/s41598-021-83689-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Javed Akhatar ◽

Anna Goyal ◽

Navneet Kaur ◽

Chhaya Atri ◽

Meenakshi Mittal ◽

...

Keyword(s):

Plant Height ◽

Indian Subcontinent ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association ◽

Strong Interactions ◽

N Availability ◽

Oilseed Crop ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractTimely transition to flowering, maturity and plant height are important for agronomic adaptation and productivity of Indian mustard (B. juncea), which is a major edible oilseed crop of low input ecologies in Indian subcontinent. Breeding manipulation for these traits is difficult because of the involvement of multiple interacting genetic and environmental factors. Here, we report a genetic analysis of these traits using a population comprising 92 diverse genotypes of mustard. These genotypes were evaluated under deficient (N75), normal (N100) or excess (N125) conditions of nitrogen (N) application. Lower N availability induced early flowering and maturity in most genotypes, while high N conditions delayed both. A genotyping-by-sequencing approach helped to identify 406,888 SNP markers and undertake genome wide association studies (GWAS). 282 significant marker-trait associations (MTA's) were identified. We detected strong interactions between GWAS loci and nitrogen levels. Though some trait associated SNPs were detected repeatedly across fertility gradients, majority were identified under deficient or normal levels of N applications. Annotation of the genomic region (s) within ± 50 kb of the peak SNPs facilitated prediction of 30 candidate genes belonging to light perception, circadian, floral meristem identity, flowering regulation, gibberellic acid pathways and plant development. These included over one copy each of AGL24, AP1, FVE, FRI, GID1A and GNC. FLC and CO were predicted on chromosomes A02 and B08 respectively. CDF1, CO, FLC, AGL24, GNC and FAF2 appeared to influence the variation for plant height. Our findings may help in improving phenotypic plasticity of mustard across fertility gradients through marker-assisted breeding strategies.

Download Full-text

Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits

Bioinformatics ◽

10.1093/bioinformatics/btw745 ◽

2017 ◽

pp. btw745 ◽

Cited By ~ 8

Author(s):

Hon-Cheong So ◽

Pak C. Sham

Keyword(s):

Complex Traits ◽

Predictive Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores

Download Full-text

Comprehensive evaluation of mapping complex traits in wheat using genome-wide association studies

Molecular Breeding ◽

10.1007/s11032-021-01272-7 ◽

2021 ◽

Vol 42 (1) ◽

Author(s):

Dinesh K. Saini ◽

Yuvraj Chopra ◽

Jagmohan Singh ◽

Karansher S. Sandhu ◽

Anand Kumar ◽

...

Keyword(s):

Complex Traits ◽

Comprehensive Evaluation ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Mapping Complex Traits

Download Full-text

GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background

10.1101/2020.04.20.051631 ◽

2020 ◽

Cited By ~ 6

Author(s):

Nasa Sinnott-Armstrong ◽

Sahin Naqvi ◽

Manuel Rivas ◽

Jonathan K Pritchard

Keyword(s):

Complex Traits ◽

Genetic Basis ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Biological Processes ◽

Uk Biobank ◽

The Core ◽

Genome Wide ◽

Core Genes

SummaryGenome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. However, for most traits it remains difficult to interpret what genes and biological processes are impacted by the top hits. Here, as a contrast, we describe UK Biobank GWAS results for three molecular traits—urate, IGF-1, and testosterone—that are biologically simpler than most diseases, and for which we know a great deal in advance about the core genes and pathways. Unlike most GWAS of complex traits, for all three traits we find that most top hits are readily interpretable. We observe huge enrichment of significant signals near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of variation in each trait, including insights into differences in testosterone regulation between females and males. Meanwhile, in other respects the results are reminiscent of GWAS for more-complex traits. In particular, even these molecular traits are highly polygenic, with most of the variance coming not from core genes, but from thousands to tens of thousands of variants spread across most of the genome. Given that diseases are often impacted by many distinct biological processes, including these three, our results help to illustrate why so many variants can affect risk for any given disease.

Download Full-text

Genome-wide association study reveals candidate genes for flowering time in cowpea (Vigna unguiculata [L.] Walp)

10.1101/2021.04.01.438123 ◽

2021 ◽

Author(s):

Dev Paudel ◽

Rocheteau Dareus ◽

Julia Rosenwald ◽

Maria Munoz-Amatriain ◽

Esteban Rios

Keyword(s):

Flowering Time ◽

Candidate Genes ◽

Vigna Unguiculata ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association ◽

Human Consumption ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Genome Wide

Cowpea (Vigna unguiculata [L.] Walp., diploid, 2n = 22) is a major crop used as a protein source for human consumption as well as a quality feed for livestock. It is drought and heat tolerant and has been bred to develop varieties that are resilient to changing climates. Plant adaptation to new climates and their yield are strongly affected by flowering time. Therefore, understanding the genetic basis of flowering time is critical to advance cowpea breeding. The aim of this study was to perform genome-wide association studies (GWAS) to identify marker trait associations for flowering time in cowpea using single nucleotide polymorphism (SNP) markers. A total of 367 accessions from a cowpea mini-core collection were evaluated in Ft. Collins, CO in 2019 and 2020, and 292 accessions were evaluated in Citra, FL in 2018. These accessions were genotyped using the Cowpea iSelect Consortium Array that contained 51,128 SNPs. GWAS revealed seven reliable SNPs for flowering time that explained 8-12% of the phenotypic variance. Candidate genes including FT, GI, CRY2, LSH3, UGT87A2, LIF2, and HTA9 that are associated with flowering time were identified for the significant SNP markers. Further efforts to validate these loci will help to understand their role in flowering time in cowpea, and it could facilitate the transfer of some of this knowledge to other closely related legume species.

Download Full-text