scholarly journals A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS

2021 ◽  
Vol 4 (4) ◽  
pp. e202000902 ◽  
Author(s):  
Robert A Player ◽  
Ellen R Forsyth ◽  
Kathleen J Verratti ◽  
David W Mohr ◽  
Alan F Scott ◽  
...  

Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.

2018 ◽  
Author(s):  
Brian P. Ward ◽  
Gina Brown-Guedira ◽  
Frederic L. Kolb ◽  
David A. Van Sanford ◽  
Priyanka Tyagi ◽  
...  

AbstractGrain yield is a trait of paramount importance in the breeding of all cereals. In wheat (Triticum aestivum L.), yield has steadily increased since the Green Revolution, though the current rate of increase is not forecasted to keep pace with demand due to growing world population and affluence. While several genome-wide association studies (GWAS) on yield and related component traits have been performed in wheat, the previous lack of a reference genome has made comparisons between studies difficult. In this study, a GWAS for yield and yield-related traits was carried out on a population of 324 soft red winter wheat lines across a total of four rain-fed environments in the state of Virginia using single-nucleotide polymorphism (SNP) marker data generated by a genotyping-by-sequencing (GBS) protocol. Two separate mixed linear models were used to identify significant marker-trait associations (MTAs). The first was a single-locus model utilizing a leave-one-chromosome-out approach to estimating kinship. The second was a sub-setting kinship multi-locus method (FarmCPU). The single-locus model identified nine significant MTAs for various yield-related traits, while the FarmCPU model identified 74 significant MTAs. The availability of the wheat reference genome allowed for the description of MTAs in terms of both genetic and physical positions, and enabled more extensive post-GWAS characterization of significant MTAs. The results indicate promising avenues for increasing grain yield by exploiting variation in traits relating to the number of grains per unit area, as well as phenological traits influencing grain-filling duration of genotypes.


2020 ◽  
Author(s):  
Robert A. Player ◽  
Ellen R. Forsyth ◽  
Kathleen J. Verratti ◽  
David W. Mohr ◽  
Alan F. Scott ◽  
...  

ABSTRACTReference genome fidelity is critically important for genome wide association studies (GWAS), yet many are incomplete or too dissimilar from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity low complexity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly utilized. Here we present a phased reference genome for Canis lupus familiaris utilizing high molecular weight sequencing technologies. We tested wet lab and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The resulting de novo assembly required eight Oxford Nanopore R9.4 flowcells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K. Mapping of publicly available short-read data from ten Labrador Retrievers against this breed-specific reference resulted in an average of approximately 1% more aligned reads compared to mapping against the current gold standard reference (CanFam3.1, p<0.001), indicating a more complete breed-specific reference. An average 15% reduction of variant calls was observed from the same mapped data, which increases the chance of identifying low effect size variants in a GWAS. We believe that by incorporating the cost to produce a full genome assembly into any large-scale canine genotyping study, an investigator can make an informed cost/benefit analysis regarding genotyping technology.


2019 ◽  
Author(s):  
Bastian Schiffthaler ◽  
Nicolas Delhomme ◽  
Carolina Bernhardsson ◽  
Jerry Jenkins ◽  
Stefan Jansson ◽  
...  

ABSTRACTThe genome assembly of the European aspen Populus tremula proved difficult for a short-read based strategy due to high genomic variation. As a consequence, the fragmented sequence is impeding studies that benefit from highly contiguous data, particularly genome-wide association studies (GWAS) and comparative genomics. Here we present an updated assembly based on long-read sequences, optical mapping and genetic mapping. This assembly - henceforth referred to as Potra V2 - is assembled into 19 contiguous chromosomes which provides a powerful tool for future association studies. The genome sequence and any feature files are available from the PopGenIE resource.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Atif Rahman ◽  
Ingileif Hallgrímsdóttir ◽  
Michael Eisen ◽  
Lior Pachter

Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting k-mers in whole-genome sequencing reads, testing for associations directly between k-mers and the trait of interest, and local assembly of the statistically significant k-mers to identify sequence differences. An analysis of the 1000 genomes data show that sequences identified by our method largely agree with results obtained using the standard approach. However, unlike standard GWAS, our method identifies associations with structural variations and sites not present in the reference genome. We also demonstrate that population stratification can be inferred from k-mers. Finally, application to an E.coli dataset on ampicillin resistance validates the approach.


Sign in / Sign up

Export Citation Format

Share Document