scholarly journals Developing a 670k genotyping array to tag ∼2M SNPs across 24 horse breeds

2017 ◽  
Author(s):  
Robert J. Schaefer ◽  
Mikkel Schubert ◽  
Ernest Bailey ◽  
Danika L. Bannasch ◽  
Eric Barrey ◽  
...  

AbstractBackgroundTo date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.ResultsUsing whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ∼5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ∼2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ∼670 thousand SNPs (MNEc670k), was designed for genotype imputation.ConclusionsHere, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.

2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Theo Meuwissen ◽  
Irene van den Berg ◽  
Mike Goddard

Abstract Background Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. Methods The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. Results The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. Conclusions Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.


2015 ◽  
Author(s):  
Shane McCarthy ◽  
Sayantan Das ◽  
Warren Kretzschmar ◽  
Olivier Delaneau ◽  
Andrew R. Wood ◽  
...  

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.


2019 ◽  
Vol 136 (6) ◽  
pp. 418-429 ◽  
Author(s):  
Sanne den Berg ◽  
Jérémie Vandenplas ◽  
Fred A. Eeuwijk ◽  
Marcos S. Lopes ◽  
Roel F. Veerkamp

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 342-342
Author(s):  
Younes Miar ◽  
Graham Plastow ◽  
Zhiquan Wang ◽  
Mehdi Sargolzaei

Abstract The fur industry is one of the oldest and the most historically significant industries in Canada. The industry has used American mink (Neovison vison) as the major source of fur for decades because of their high-quality fur and wide range of colours. This project will seek to (1) create the first accurate whole-genome sequence assembly of mink using next-generation sequencing technology to help understanding the biology and evolution of the order Carnivora, (2) design a robust and informative SNP assay for genomics discovery in mink, (3) discover genome structure and signature of selection as well as identify new genetic variants explaining variation in economically important traits, and (4) identify the genetic relationships among these traits including feed efficiency, Aleutian disease resilience, fur quality, reproductive performance, growth rate and pelt size. One hundred mink DNA samples from the Canadian Centre for Fur Animal Research at Dalhousie Agriculture Campus (Truro, Nova Scotia), and one breeding population (Millbank Fur Farm Limited, Rockwood, Ontario) were sequenced using next-generation whole-genome sequencing with more than 30x coverage to create the first SNP assay for American mink. A DNA panel composed of these sequenced mink from five color-types were assembled to identify the most homozygous individual as the reference animal for whole-genome sequence assembly development. The phenotypic data and DNA samples from 3,323 animals were collected and will be genotyped using the customized assay. The ultimate objective is to develop new tools for implementation of marker assisted selection or genomic selection in mink breeding programs for development of superior, highly efficient, and healthy animals. This approach will help improve the overall performance of the North American mink industry, which is now in difficulty due to several economic factors such as the high price of feed, declining price of fur and prevalence of diseases.


BMC Genetics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Steven G. Larmer ◽  
Mehdi Sargolzaei ◽  
Luiz F. Brito ◽  
Ricardo V. Ventura ◽  
Flávio S. Schenkel

2021 ◽  
Author(s):  
Praveen F Cherukuri ◽  
Melissa M. Soe ◽  
David E. Condon ◽  
Shubhi Bartaria ◽  
Kaitlynn Meis ◽  
...  

Abstract Background Clinical use of genotype data requires high positive predictive value (PPV) and thorough understanding of the genotyping platform characteristics. BeadChip arrays, such as the Global Screening Array (GSA), potentially offer a high-throughput, low-cost clinical screen for known variants. We hypothesize that quality assessment and comparison to whole-genome sequence and benchmark data establish the analytical validity of GSA genotyping.Methods To test this hypothesis, we selected 263 samples from Coriell, generated GSA genotypes in triplicate, generated whole genome sequence (rWGS) genotypes, assessed the quality of each set of genotypes, and compared each set of genotypes to each other and to the 1000 Genomes Phase 3 (1KG) genotypes, a performance benchmark. For 59 genes (MAP59), we also performed theoretical and empirical evaluation of variants deemed medically actionable predispositions.Results Quality analyses detected sample contamination and increased assay failure along the chip margins. Comparison to benchmark data demonstrated that > 82% of the GSA assays had a PPV of 1. GSA assays targeting transitions, genomic regions of high complexity, and common variants performed better than those targeting transversions, regions of low complexity, and rare variants. Comparison of GSA data to rWGS and 1KG data showed >99.3% concordance across all measured parameters. GSA detection of variation within the MAP59 genes was 3/261 consistent with predictions from prior studies.Conclusion We establish the analytical validity of GSA assays using quality analytics and comparison to benchmark and rWGS data. GSA assays meet the standards of a clinical screen although assays interrogating rare variants, transversions, and variants within low-complexity regions require careful evaluation.


Sign in / Sign up

Export Citation Format

Share Document