The construction of a haplotype reference panel using extremely low coverage whole genome sequences and its application in genome-wide association studies and genomic prediction in Duroc pigs

AbstractGenome-wide association studies (GWAS) have considerably advanced our understanding of human traits and diseases. With the increasing availability of whole genome sequences (WGS) for pathogens, it is important to establish whether GWAS of viral genomes could reveal important biological insights. Here we perform the first proof of concept viral GWAS examining drug resistance (DR), a phenotype with well understood genetics.We performed a GWAS of DR in a sample of 343 HIV subtype C patients failing 1st line antiretroviral treatment in rural KwaZulu-Natal, South Africa. The majority and minority variants within each sequence were called using PILON, and GWAS was performed within PLINK. HIV WGS from patients failing on different antiretroviral treatments were compared to sequences derived from individuals naive to the respective treatment.GWAS methodology was validated by identifying five associations on a genetic level that led to amino acid changes known to cause DR. Further, we highlighted the ability of GWAS to identify epistatic effects, identifying two replicable variants within amino acid 68 of the reverse transcriptase protein previously described as potential fitness compensatory mutations. A possible additional DR variant within amino acid 91 of the matrix region of the Gag protein was associated with tenofovir failure, highlighting the ability of GWAS to identify variants outside classical candidate genes. Our results also suggest a polygenic component to DR.These results validate the applicability of GWAS to HIV WGS data even in relative small samples, and emphasise how high throughput sequencing can provide novel and clinically relevant insights. Further they suggested that for viruses like HIV, population structure was only minor concern compared to that seen in bacteria or parasite GWAS. Given the small genome length and reduced burden for multiple testing, this makes HIV an ideal candidate for GWAS.

Download Full-text

Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle

Animals ◽

10.3390/ani11020541 ◽

2021 ◽

Vol 11 (2) ◽

pp. 541

Author(s):

Long Chen ◽

Jennie E. Pryce ◽

Ben J. Hayes ◽

Hans D. Daetwyler

Keyword(s):

Dairy Cattle ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.

Download Full-text

Imputation of canine genotype array data using 365 whole-genome sequences improves power of genome-wide association studies

PLoS Genetics ◽

10.1371/journal.pgen.1008003 ◽

2019 ◽

Vol 15 (9) ◽

pp. e1008003 ◽

Cited By ~ 10

Author(s):

Jessica J. Hayward ◽

Michelle E. White ◽

Michael Boyle ◽

Laura M. Shannon ◽

Margret L. Casal ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Sequences ◽

Array Data ◽

Genome Wide ◽

Genotype Array

Download Full-text

Imputation of canine genotype array data using 365 whole-genome sequences improves power of genome-wide association studies

10.1101/540559 ◽

2019 ◽

Author(s):

Jessica J. Hayward ◽

Michelle E. White ◽

Michael Boyle ◽

Laura M. Shannon ◽

Margret L. Casal ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Complex Trait ◽

Reference Panel ◽

Genome Wide Association ◽

Domestic Dog ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Trait Mapping ◽

Genome Wide

AbstractGenomic resources for the domestic dog have improved with the widespread adoption of a 173k SNP array platform and updated reference genome. SNP arrays of this density are sufficient for detecting genetic associations within breeds but are underpowered for finding associations across multiple breeds or in mixed-breed dogs, where linkage disequilibrium rapidly decays between markers, even though such studies would hold particular promise for mapping complex diseases and traits. Here we introduce an imputation reference panel, consisting of 365 diverse, whole-genome sequenced dogs and wolves, which increases the number of markers that can be queried in genome-wide association studies approximately 130-fold. Using previously genotyped dogs, we show the utility of this reference panel in identifying novel associations and fine-mapping for canine body size and blood phenotypes, even when causal loci are not in strong linkage disequilibrium with any single array marker. This reference panel resource will improve future genome-wide association studies for canine complex diseases and other phenotypes.Author SummaryComplex traits are controlled by more than one gene and as such are difficult to map. For complex trait mapping in the domestic dog, researchers use the current array of 173,000 variants, with only minimal success. Here, we use a method called imputation to increase the number of variants – from 173,000 to 24 million – that can be queried in canine association studies. We use sequence data from the whole genomes of 365 dogs and wolves to accurately predict variants, in a separate cohort of dogs, that are not present on the array. Using dog body size, we show that the increase in variants results in an increase in mapping power, through the identification of new associations and the narrowing of regions of interest. This imputation panel is particularly important because of its usefulness in improving complex trait mapping in the dog, which has significant implications for discovery of variants in humans with similar diseases.

Download Full-text

Genome-wide association studies of fertility and calving traits in Brown Swiss cattle using imputed whole-genome sequences

BMC Genomics ◽

10.1186/s12864-017-4308-z ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 14

Author(s):

Mirjam Frischknecht ◽

◽

Beat Bapst ◽

Franz R. Seefried ◽

Heidi Signer-Hasler ◽

...

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Sequences ◽

Brown Swiss ◽

Genome Wide ◽

Calving Traits ◽

Brown Swiss Cattle

Download Full-text

Faculty Opinions recommendation of Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717960893.793463835 ◽

2012 ◽

Author(s):

Nicola Mulder

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Low Coverage

Download Full-text

Integration of genome wide association studies and whole genome sequencing provides novel insights into fat deposition in chicken

Scientific Reports ◽

10.1038/s41598-018-34364-0 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 8

Author(s):

Gabriel Costa Monteiro Moreira ◽

Clarissa Boschiero ◽

Aline Silva Mello Cesar ◽

James M. Reecy ◽

Thaís Fernanda Godoy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Fat Deposition ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Download Full-text

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa068 ◽

2020 ◽

Vol 27 (9) ◽

pp. 1425-1430

Author(s):

Inès Krissaane ◽

Carlos De Niz ◽

Alba Gutiérrez-Sacristán ◽

Gabor Korodi ◽

Nneka Ede ◽

...

Keyword(s):

Web Services ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Cloud Platform ◽

Human Genomics ◽

Genome Wide ◽

Innovative Methodology ◽

Amazon Web Services

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?

Download Full-text

Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data

Genome Biology ◽

10.1186/s13059-017-1216-0 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 46

Author(s):

Yang Wu ◽

Zhili Zheng ◽

Peter M. Visscher ◽

Jian Yang

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequencing Data ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

Genomic Prediction and Genome-Wide Association Studies of Flour Yield and Alveograph Quality Traits Using Advanced Winter Wheat Breeding Material

Genes ◽

10.3390/genes10090669 ◽

2019 ◽

Vol 10 (9) ◽

pp. 669 ◽

Cited By ~ 2

Author(s):

Peter S. Kristensen ◽

Just Jensen ◽

Jeppe R. Andersen ◽

Carlos Guzmán ◽

Jihad Orabi ◽

...

Keyword(s):

Winter Wheat ◽

Genomic Prediction ◽

Association Studies ◽

Wheat Breeding ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Quality Traits ◽

Breeding Programs ◽

Genome Wide ◽

Flour Yield

Use of genetic markers and genomic prediction might improve genetic gain for quality traits in wheat breeding programs. Here, flour yield and Alveograph quality traits were inspected in 635 F6 winter wheat breeding lines from two breeding cycles. Genome-wide association studies revealed single nucleotide polymorphisms (SNPs) on chromosome 5D significantly associated with flour yield, Alveograph P (dough tenacity), and Alveograph W (dough strength). Additionally, SNPs on chromosome 1D were associated with Alveograph P and W, SNPs on chromosome 1B were associated with Alveograph P, and SNPs on chromosome 4A were associated with Alveograph L (dough extensibility). Predictive abilities based on genomic best linear unbiased prediction (GBLUP) models ranged from 0.50 for flour yield to 0.79 for Alveograph W based on a leave-one-out cross-validation strategy. Predictive abilities were negatively affected by smaller training set sizes, lower genetic relationship between lines in training and validation sets, and by genotype–environment (G×E) interactions. Bayesian Power Lasso models and genomic feature models resulted in similar or slightly improved predictions compared to GBLUP models. SNPs with the largest effects can be used for screening large numbers of lines in early generations in breeding programs to select lines that potentially have good quality traits. In later generations, genomic predictions might be used for a more accurate selection of high quality wheat lines.

Download Full-text