41 Using Sequence Data to Increase Accuracy of Genomic Predictions in Livestock: Are We There Yet?

Abstract As sequence data is becoming available for many livestock species, there is a question on whether this information can help to boost the accuracy of genomic predictions beyond what has already been achieved with SNP chips. Several studies have been conducted by our group using simulated and real livestock populations that included from 1,000 to 100,000 animals with full or imputed sequence information. For the real datasets, the potential causative variants were identified based on genome-wide association (GWA) and were added to the current SNP chips. Additional scenarios included the use of only causative variants and the use of all sequence SNP. Genomic predictions were obtained based on single-step GBLUP (ssGBLUP), and in some cases, Bayesian regressions. Overall, in real datasets, we observed no significant increase in accuracy by using all sequence SNP, causative variants alone, or combined with SNP currently used for genomic prediction. However, an increase in accuracy of almost 100% was observed in simulated datasets when the causative variants were added to a 60k SNP panel and their simulated variances were accounted for by the prediction model. Our results show that if true causative variants are identified, together with their position and the variance explained, a boost in accuracy can be observed. This raises a question on the effectiveness of the methods and size of the datasets used to select causative variants in real livestock populations. We observed distinct GWA methods work differently depending on the data structure, and the number of genotyped animals with phenotypes. The combination of large-scale sequence and other layers of omics data (e.g., functional data) can help to identify some of the true causative variants. This could possibly promote an increase in the accuracy of genomic predictions in real populations.

Download Full-text

Frequentist p-values for large-scale-single step genome-wide association, with an application to birth weight in American Angus cattle

Genetics Selection Evolution ◽

10.1186/s12711-019-0469-3 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 19

Author(s):

Ignacio Aguilar ◽

Andres Legarra ◽

Fernando Cardoso ◽

Yutaka Masuda ◽

Daniela Lourenco ◽

...

Keyword(s):

Birth Weight ◽

Large Scale ◽

Single Step ◽

Genome Wide Association ◽

P Values ◽

Angus Cattle ◽

Genome Wide

Download Full-text

Fine-mapping of genes determining extrafusal fiber properties in murine soleus muscle

Physiological Genomics ◽

10.1152/physiolgenomics.00092.2016 ◽

2017 ◽

Vol 49 (3) ◽

pp. 141-150 ◽

Cited By ~ 9

Author(s):

A. M. Carroll ◽

R. Cheng ◽

E. S. R. Collie-Duguid ◽

C. Meharg ◽

M. E. Scholz ◽

...

Keyword(s):

Candidate Genes ◽

Muscle Fiber ◽

Soleus Muscle ◽

Genomic Sequence ◽

Sequence Data ◽

Mouse Strains ◽

Sequence Information ◽

Type I ◽

Fiber Types ◽

Genome Wide

Muscle fiber cross-sectional area (CSA) and proportion of different fiber types are important determinants of muscle function and overall metabolism. Genetic variation plays a substantial role in phenotypic variation of these traits; however, the underlying genes remain poorly understood. This study aimed to map quantitative trait loci (QTL) affecting differences in soleus muscle fiber traits between the LG/J and SM/J mouse strains. Fiber number, CSA, and proportion of oxidative type I fibers were assessed in the soleus of 334 genotyped female and male mice of the F34generation of advanced intercross lines (AIL) derived from the LG/J and SM/J strains. To increase the QTL detection power, these data were combined with 94 soleus samples from the F2intercross of the same strains. Transcriptome of the soleus muscle of LG/J and SM/J females was analyzed by microarray. Genome-wide association analysis mapped four QTL (genome-wide P < 0.05) affecting the properties of muscle fibers to chromosome 2, 3, 4, and 11. A 1.5-LOD QTL support interval ranged between 2.36 and 4.67 Mb. On the basis of the genomic sequence information and functional and transcriptome data, we identified candidate genes for each of these QTL. The combination of analyses in F2and F34AIL populations with transcriptome and genomic sequence data in the parental strains is an effective strategy for refining QTL and nomination of the candidate genes.

Download Full-text

28 Genomic prediction for marbling score in Hanwoo cattle using sequence data

Journal of Animal Science ◽

10.1093/jas/skaa278.022 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 11-12

Author(s):

Sungbong Jang ◽

Andre Garcia ◽

Seunghwan Lee ◽

Shogo Tsuruta ◽

Daniela Lourenco

Keyword(s):

Fixed Effects ◽

Sequence Data ◽

Predictive Ability ◽

Whole Genome Sequence ◽

Pedigree Information ◽

Sequence Information ◽

Whole Genome ◽

Marbling Score ◽

Hanwoo Cattle ◽

Genomic Predictions

Abstract As sequence information becomes available for some livestock species, there is a question on the amount of non-redundant information that may be embedded in the extra millions of SNPs or in causative variants within sequence. The objective of this study was to assess the gain in accuracy by using imputed whole-genome and selected SNP from sequence data in a beef cattle population. The dataset consisted of marbling score phenotypes for 545K Hanwoo cattle and pedigree information for 1.3M, of which 1,160 were genotyped for 50K SNPs. Imputation was done first to 777K and then to whole-genome sequence (WGS), which comprised 11,146,536 SNPs. Additionally, differentially expressed genes (DEG) and their 321,614 harboring SNPs were identified based on RNA-seq analysis of animals with high and low marbling score. An extra scenario combined 50K with DEG SNPs. Genomic predictions were obtained using GBLUP and single-step GBLUP (ssGBLUP) with and without weights, and BayesR. The last method could not be used for the WGS data because of the large number of SNPs. Predictive ability was calculated as the correlation between phenotypes adjusted for fixed effects and GEBV for 169 young animals. For all the methods, WGS and DEG had a slightly negative impact on predictive ability. Both GBLUP and BayesR had similar performances when using 50K, DEG, and 50K+ DEG, with predictive abilities equal to 0.19, 0.16, and 0.18, respectively. Predictive ability for ssGBLUP was 0.27, 0.26, and 0.27, in the aforementioned order. Using WGS, predictive ability for GBLUP was 0.17 and for ssGBLUP was 0.26. Weighting SNP differently did not improve predictions. As ssGBLUP uses all data available, not only genotyped animals with phenotypes as the other methods, it is more robust for genomic predictions. No gain in accuracy was observed, possibly because the selected sequence variants were not causative.

Download Full-text

SeqBreed: a python tool to evaluate genomic prediction in complex scenarios

10.1101/748624 ◽

2019 ◽

Author(s):

M. Pérez-Enciso ◽

L. C. Ramírez-Ayala ◽

L.M. Zingaretti

Keyword(s):

Genomic Prediction ◽

Predictive Accuracy ◽

Sequence Data ◽

Association Studies ◽

Single Step ◽

Genome Wide Association ◽

Drosophila Genome ◽

Genome Wide Association Studies ◽

Complex Phenotypes ◽

Genome Wide

AbstractBackgroundGenomic Prediction (GP) is the procedure whereby molecular information is used to predict complex phenotypes. Although GP can significantly enhance predictive accuracy, it can be expensive and difficult to implement. To help in designing optimum experiments, including genome wide association studies and genomic selection experiments, we have developed SeqBreed, a generic and flexible python3 forward simulator.ResultsSeqBreed accommodates sex and mitochondrion chromosomes as well as autopolyploidy. It can simulate any number of complex phenotypes determined by any number of causal loci. SeqBreed implements several GP methods, including single step GBLUP. We demonstrate its functionality with Drosophila Genome Reference Panel (DGRP) sequence data and with tetraploid potato genotypes.ConclusionsSeqBreed is a flexible and easy to use tool appropriate for optimizing GP or genome wide association studies. It incorporates some of the most popular GP methods and includes several visualization tools. Code is open and can be freely modified. Software, documentation and examples are available at https://github.com/miguelperezenciso/SeqBreed.

Download Full-text

Unbiased population heterozygosity estimates from genome-wide sequence data

10.1101/2020.12.20.423694 ◽

2020 ◽

Author(s):

Thomas L Schmidt ◽

Moshe Jasper ◽

Andrew R Weeks ◽

Ary A Hoffmann

Keyword(s):

Missing Data ◽

Genetic Variability ◽

Sample Size ◽

Sequence Data ◽

Snp Markers ◽

Sequence Information ◽

List Type ◽

Invasion History ◽

Sample Sizes ◽

Genome Wide

AbstractGenetic variability within populations is a key parameter for the management of threatened species and for tracking invasion history. Heterozygosity (observed and expected) is commonly used to represent genetic variability and is increasingly being estimated with single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis of these markers may depend heavily on ‘upstream’ filtering decisions.Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of heterozygosity using a ddRADseq dataset of the mosquito Aedes aegypti and a DArTseq dataset of a threatened grasshopper, Keyacris scurra.We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by sample size, regardless of other filtering considerations. By contrast, results are consistent across sample sizes when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome-wide or autosomal heterozygosity). We also show that when loci with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each locus.To ensure robust results across studies, we make two recommendations for estimating heterozygosity: (i) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (ii) sites with any missing data should be removed when calculating heterozygosity. Applying these protocols to K. scurra, we show that autosomal heterozygosity estimates are consistent even when populations with different sample sizes and high levels of differentiation are analysed together. This should facilitate comparisons across studies and between observed and expected measures of heterozygosity.

Download Full-text

40 Factors Influencing Accuracy of Genomic Selection with Sequence Information

Journal of Animal Science ◽

10.1093/jas/skab235.034 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 20-21

Author(s):

Ignacy Misztal ◽

Ivan Pocrnic ◽

Daniela Lourenco

Keyword(s):

Population Size ◽

Genomic Selection ◽

Effective Population Size ◽

Sequence Data ◽

Single Step ◽

Substitution Effects ◽

Sequence Information ◽

Effective Population ◽

P Values ◽

Population Sizes

Abstract Incorporating the sequence information only marginally increases the accuracy of genomic selection. The purpose of this study was to find out why by examining profiles of Quantitative Trait Nucleotides (QTN). Multiple populations were simulated with different effective population sizes and number of animals. 100 equidistant QTN with identical substitution effects were included in 50k SNP genotypes. Analyses were by single-step GBLUP, with solutions converted to SNP values and subsequently to p-values for each SNP. Manhattan plots for standardized SNP solutions were noisy and were elevated only for few QTNs. Manhattan plots for p-values were similar to those for SNP solutions, indicating little impact of population structure. The number of significant QTN was lower with lower effective population size and increased with larger data; at most about 20% of QTNs were detected. A QTN profile was created by averaging SNP solutions ±100 SNP around each QTN. The profile showed a normal-like response but with a distinct peak for the QTN. While the peak was higher with more data and higher effective population size, the normal-like response was smaller with higher effective population size. QTNs explained little variance because of shrinkage. The accuracy of genomic selection would be 100% if all QTNs are identified and their variances known, to prevent shrinking or inflation. This study allows to see limits of application of QTN from sequence data for genomic selection. If all causative SNP are included in the data, only a fraction of them can be identified even under a very simplistic architecture. As variance of QTN are assumed constant or are crude approximations (like in BayesR), the estimated QTN effects are inaccurate. Additional complications in QTN detection are close-spaced QTN and false QTNs due to imputation. Small effective population size allows the genomic selection by GBLUP but complicates the use of QTNs.

Download Full-text

Single step genome-wide association studies based on genotyping by sequence data reveals novel loci for the litter traits of domestic pigs

Genomics ◽

10.1016/j.ygeno.2017.09.009 ◽

2018 ◽

Vol 110 (3) ◽

pp. 171-179 ◽

Cited By ~ 22

Author(s):

Pingxian Wu ◽

Qiang Yang ◽

Kai Wang ◽

Jie Zhou ◽

Jideng Ma ◽

...

Keyword(s):

Sequence Data ◽

Association Studies ◽

Single Step ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Domestic Pigs ◽

Genome Wide ◽

Litter Traits

Download Full-text

Faculty Opinions recommendation of Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13345956.14715054 ◽

2011 ◽

Author(s):

Craig Hersh

Keyword(s):

Lung Function ◽

Large Scale ◽

Genome Wide Association ◽

Genome Wide

Download Full-text

Faculty Opinions recommendation of Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734261365.793558023 ◽

2019 ◽

Author(s):

Jason Flannick

Keyword(s):

Large Scale ◽

Genome Wide

Download Full-text

AFLP-derived, Codominant Markers for Locus-specific Applications

HortScience ◽

10.21273/hortsci.33.3.514e ◽

1998 ◽

Vol 33 (3) ◽

pp. 514e-514

Author(s):

James M. Bradeen ◽

Philipp W. Simon

Keyword(s):

Linkage Mapping ◽

Large Scale ◽

Pcr Primers ◽

Inverse Pcr ◽

Sequence Information ◽

Pcr Assay ◽

Specific Primers ◽

Simultaneous Evaluation ◽

Feral Populations ◽

Diversity Assessment

The amplified fragment length polymorphism (AFLP) is a powerful marker, allowing rapid and simultaneous evaluation of multiple potentially polymorphic sites. Although well-adapted to linkage mapping and diversity assessment, AFLPs are primarily dominant in nature. Dominance, relatively high cost, and technological difficulty limit use of AFLPs for marker-aided selection and other locus-specific applications. In carrot the Y2 locus conditions carotene accumulation in the root xylem. We identified AFLP fragments linked to the dominant Y2 allele and pursued conversion of those fragments to codominant, PCR-based forms useful for locus-specific applications. The short length of AFLPs (≈60 to 500 bp) precludes development of longer, more specific primers as in SCAR development. Instead, using sequence information from cloned AFLP fragments for primer design, regions outside of the original fragment were amplified by inverse PCR or ligation-mediated PCR, cloned, and sequenced. Differences in sequences associated with Y2 vs. y2 allowed development of simple PCR assays differentiating those alleles. PCR primers flanking an insertion associated with the recessive allele amplified differently sized products for the two Y2 alleles in one assay. This assay is rapid, technologically simple (requiring no radioactivity and little advanced training or equipment), reliable, inexpensive, and codominant. Our PCR assay has a variety of large scale, locus-specific applications including genotyping diverse carrot cultivars and wild and feral populations. Efforts are underway to improve upon conversion technology and to more extensively test the techniques we have developed.

Download Full-text