Genomic prediction using low-coverage portable Nanopore sequencing

Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This suggests that ONT sequencing has potential for in clinic or on-farm genomic prediction, however, further work to validate these findings in a larger population still remains.

Download Full-text

In-situ genomic prediction using low-coverage Nanopore sequencing

10.1101/2021.07.16.452615 ◽

2021 ◽

Author(s):

Harry Lamb ◽

Ben Hayes ◽

Imtiaz Randhawa ◽

Loan Nguyen ◽

Elizabeth Ross

Keyword(s):

Genomic Prediction ◽

Disease Risk ◽

Sequence Data ◽

Snp Array ◽

Genotyping By Sequencing ◽

Risk Scores ◽

Genomic Breeding ◽

Breeding Values ◽

Low Coverage ◽

On Farm

Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This demonstrates that ONT sequencing has great potential for in clinic or on-farm genomic prediction.

Download Full-text

A sorghum Practical Haplotype Graph facilitates genome-wide imputation and cost-effective genomic prediction

10.1101/775221 ◽

2019 ◽

Author(s):

Sarah E. Jensen ◽

Jean Rigaud Charles ◽

Kebede Muleta ◽

Peter Bradbury ◽

Terry Casstevens ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Sequence Data ◽

Input Sequence ◽

Genotyping By Sequencing ◽

Cost Effective ◽

Genome Wide ◽

Variant Information ◽

Sequencing Platforms ◽

Low Coverage

AbstractSuccessful management and utilization of increasingly large genomic datasets is essential for breeding programs to increase genetic gain and accelerate cultivar development. To help with data management and storage, we developed a sorghum Practical Haplotype Graph (PHG) pangenome database that stores all identified haplotypes and variant information for a given set of individuals. We developed two PHGs in sorghum, one with 24 individuals and another with 398 individuals, that reflect the diversity across genic regions of the sorghum genome. 24 founders of the Chibas sorghum breeding program were sequenced at low coverage (0.01x) and processed through the PHG to identify genome-wide variants. The PHG called SNPs with only 5.9% error at 0.01x coverage - only 3% lower than its accuracy when calling SNPs from 8x coverage sequence. Additionally, 207 progeny from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes in the progeny were imputed from the parental haplotypes available in the PHG and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from 0.57-0.73 for different traits, and are similar to prediction accuracies obtained with genotyping-by-sequencing (GBS) or markers from sequencing targeted amplicons (rhAmpSeq). This study provides a proof of concept for using a sorghum PHG to call and impute SNPs from low-coverage sequence data and also shows that the PHG can unify genotype calls from different sequencing platforms. By reducing the amount of input sequence needed, the PHG has the potential to decrease the cost of genotyping for genomic selection, making GS more feasible and facilitating larger breeding populations that can capture maximum recombination. Our results demonstrate that the PHG is a useful research and breeding tool that can maintain variant information from a diverse group of taxa, store sequence data in a condensed but readily accessible format, unify genotypes from different genotyping methods, and provide a cost-effective option for genomic selection for any species.

Download Full-text

GPS Coordinates for Modelling Correlated Herd Effects in Genomic Prediction Models Applied to Hanwoo Beef Cattle

Animals ◽

10.3390/ani11072050 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2050

Author(s):

Beatriz Castro Dias Cuyabano ◽

Gabriel Rovere ◽

Dajeong Lim ◽

Tae Hun Kim ◽

Hak Kyo Lee ◽

...

Keyword(s):

Environmental Factors ◽

Genomic Prediction ◽

Prediction Models ◽

Phenotypic Expression ◽

Genetic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

Korean Cattle ◽

Evaluation Programs ◽

The Impact

It is widely known that the environment influences phenotypic expression and that its effects must be accounted for in genetic evaluation programs. The most used method to account for environmental effects is to add herd and contemporary group to the model. Although generally informative, the herd effect treats different farms as independent units. However, if two farms are located physically close to each other, they potentially share correlated environmental factors. We introduce a method to model herd effects that uses the physical distances between farms based on the Global Positioning System (GPS) coordinates as a proxy for the correlation matrix of these effects that aims to account for similarities and differences between farms due to environmental factors. A population of Hanwoo Korean cattle was used to evaluate the impact of modelling herd effects as correlated, in comparison to assuming the farms as completely independent units, on the variance components and genomic prediction. The main result was an increase in the reliabilities of the predicted genomic breeding values compared to reliabilities obtained with traditional models (across four traits evaluated, reliabilities of prediction presented increases that ranged from 0.05 ± 0.01 to 0.33 ± 0.03), suggesting that these models may overestimate heritabilities. Although little to no significant gain was obtained in phenotypic prediction, the increased reliability of the predicted genomic breeding values is of practical relevance for genetic evaluation programs.

Download Full-text

A study of Genomic Prediction across Generations of Two Korean Pig Populations

Animals ◽

10.3390/ani9090672 ◽

2019 ◽

Vol 9 (9) ◽

pp. 672 ◽

Cited By ~ 1

Author(s):

Beatriz Castro Dias Castro Dias Cuyabano ◽

Hanna Wackel ◽

Donghyun Shin ◽

Cedric Gondro

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Reference Population ◽

Relevant Information ◽

Production Traits ◽

Genomic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

The Relationship ◽

Dense Marker

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.

Download Full-text

Assessment of Imputation from Low-Pass Sequencing to Predict Merit of Beef Steers

Genes ◽

10.3390/genes11111312 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1312

Author(s):

Warren M. Snelling ◽

Jesse L. Hoff ◽

Jeremiah H. Li ◽

Larry A. Kuehn ◽

Brittney N. Keel ◽

...

Keyword(s):

Genomic Prediction ◽

Genomic Sequence ◽

Bos Taurus ◽

Low Cost ◽

Bos Indicus ◽

Snp Array ◽

Attractive Alternative ◽

Functional Variant ◽

Low Pass ◽

Low Coverage

Decreasing costs are making low coverage sequencing with imputation to a comprehensive reference panel an attractive alternative to obtain functional variant genotypes that can increase the accuracy of genomic prediction. To assess the potential of low-pass sequencing, genomic sequence of 77 steers sequenced to >10X coverage was downsampled to 1X and imputed to a reference of 946 cattle representing multiple Bos taurus and Bos indicus-influenced breeds. Genotypes for nearly 60 million variants detected in the reference were imputed from the downsampled sequence. The imputed genotypes strongly agreed with the SNP array genotypes (r¯=0.99) and the genotypes called from the transcript sequence (r¯=0.97). Effects of BovineSNP50 and GGP-F250 variants on birth weight, postweaning gain, and marbling were solved without the steers’ phenotypes and genotypes, then applied to their genotypes, to predict the molecular breeding values (MBV). The steers’ MBV were similar when using imputed and array genotypes. Replacing array variants with functional sequence variants might allow more robust MBV. Imputation from low coverage sequence offers a viable, low-cost approach to obtain functional variant genotypes that could improve genomic prediction.

Download Full-text

Accuracy of genomic breeding values revisited: Assessment of two established approaches and a novel one to determine the accuracy in two-step genomic prediction

Journal of Animal Breeding and Genetics ◽

10.1111/jbg.12273 ◽

2017 ◽

Vol 134 (3) ◽

pp. 242-255 ◽

Cited By ~ 1

Author(s):

G. Ni ◽

S. Kipp ◽

H. Simianer ◽

M. Erbe

Keyword(s):

Genomic Prediction ◽

Genomic Breeding ◽

Breeding Values

Download Full-text

Calibration and validation of predicted genomic breeding values in an advanced cycle maize population

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03880-5 ◽

2021 ◽

Author(s):

Hans-Jürgen Auinger ◽

Christina Lehermeier ◽

Daniel Gianola ◽

Manfred Mayer ◽

Albrecht E. Melchinger ◽

...

Keyword(s):

Sample Size ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Dry Matter ◽

Dry Matter Content ◽

Genomic Breeding ◽

Breeding Values ◽

Average Maximum ◽

Calibration Sets ◽

Model Training

Abstract Key message Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. Abstract The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.

Download Full-text

Estimation of genetic parameters and accuracy of genomic prediction for production traits in Duroc pigs

Czech Journal of Animal Science ◽

10.17221/150/2018-cjas ◽

2019 ◽

Vol 64 (No. 4) ◽

pp. 160-165 ◽

Cited By ~ 1

Author(s):

Bryan Irvine Lopez ◽

Vanessa Viterbo ◽

Choul Won Song ◽

Kang Seok Seo

Keyword(s):

Genomic Prediction ◽

Genetic Parameters ◽

Average Daily Gain ◽

Genetic Correlations ◽

Single Step ◽

Production Traits ◽

Muscle Area ◽

Genomic Breeding ◽

Breeding Values ◽

Heritability Estimates

Abstract: Genetic parameters and accuracy of genomic prediction for production traits in a Duroc population were estimated. Data were on 24 828 purebred Duroc pigs born in 2000–2016. After quality control procedures, 30 263 single nucleotide polymorphism markers and 560 animals remained that were used to predict the genomic breeding values of individuals. Accuracies of predicted breeding values for average daily gain (ADG), backfat thickness (BF), loin muscle area (LMA), lean percentage (LP) and age at 90 kg (D90) between pedigree-based and single-step methods were compared. Analyses were carried out with a multivariate animal model to estimate genetic parameters for production traits while univariate analyses were performed to predict the genomic breeding values of individuals. Heritability estimates from pedigree analysis were moderate to high. Heritability estimates and standard error for ADG, BF, LMA, LP and D90 were 0.35 ± 0.01, 0.35 ± 0.11, 0.24 ± 0.04, 0.42 ± 0.11 and 0.37 ± 0.03, respectively. Genetic correlations of ADG with BF and LP were low and negative. Genetic correlations of LMA with ADG, BF, LP and D90 were –0.37, –0.27, 0.48 and 0.31, respectively. High correlations were observed between ADG and D90 (–0.98), and also between BF and LP (–0.93). Accuracies of genomic breeding values for ADG, BF, LMA, LP and D90 were 0.30, 0.33, 0.38, 0.40 and 0.28, respectively. Corresponding accuracies using pedigree-based method were 0.29, 0.32, 0.38, 0.39 and 0.27, respectively. The results showed that the single-step method did not show significant advantage compared to the pedigree-based method.

Download Full-text

Use of gene expression and whole-genome sequence information to improve the accuracy of genomic prediction for carcass traits in Hanwoo cattle

Genetics Selection Evolution ◽

10.1186/s12711-020-00574-2 ◽

2020 ◽

Vol 52 (1) ◽

Author(s):

Sara de las Heras-Saldana ◽

Bryan Irvine Lopez ◽

Nasir Moghaddar ◽

Woncheoul Park ◽

Jong-eun Park ◽

...

Keyword(s):

Gene Expression ◽

Genomic Prediction ◽

Sequence Data ◽

Snp Array ◽

Whole Genome Sequence ◽

Sequence Information ◽

Eye Muscle ◽

A Genome ◽

Accuracy Of Prediction ◽

Hanwoo Cattle

Abstract Background In this study, we assessed the accuracy of genomic prediction for carcass weight (CWT), marbling score (MS), eye muscle area (EMA) and back fat thickness (BFT) in Hanwoo cattle when using genomic best linear unbiased prediction (GBLUP), weighted GBLUP (wGBLUP), and a BayesR model. For these models, we investigated the potential gain from using pre-selected single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) on imputed sequence data and from gene expression information. We used data on 13,717 animals with carcass phenotypes and imputed sequence genotypes that were split in an independent GWAS discovery set of varying size and a remaining set for validation of prediction. Expression data were used from a Hanwoo gene expression experiment based on 45 animals. Results Using a larger number of animals in the reference set increased the accuracy of genomic prediction whereas a larger independent GWAS discovery dataset improved identification of predictive SNPs. Using pre-selected SNPs from GWAS in GBLUP improved accuracy of prediction by 0.02 for EMA and up to 0.05 for BFT, CWT, and MS, compared to a 50 k standard SNP array that gave accuracies of 0.50, 0.47, 0.58, and 0.47, respectively. Accuracy of prediction of BFT and CWT increased when BayesR was applied with the 50 k SNP array (0.02 and 0.03, respectively) and was further improved by combining the 50 k array with the top-SNPs (0.06 and 0.04, respectively). By contrast, using BayesR resulted in limited improvement for EMA and MS. wGBLUP did not improve accuracy but increased prediction bias. Based on the RNA-seq experiment, we identified informative expression quantitative trait loci, which, when used in GBLUP, improved the accuracy of prediction slightly, i.e. between 0.01 and 0.02. SNPs that were located in genes, the expression of which was associated with differences in trait phenotype, did not contribute to a higher prediction accuracy. Conclusions Our results show that, in Hanwoo beef cattle, when SNPs are pre-selected from GWAS on imputed sequence data, the accuracy of prediction improves only slightly whereas the contribution of SNPs that are selected based on gene expression is not significant. The benefit of statistical models to prioritize selected SNPs for estimating genomic breeding values is trait-specific and depends on the genetic architecture of each trait.

Download Full-text

Accuracies of direct genomic breeding values for birth and weaning weights of registered Charolais cattle in Mexico

Animal Production Science ◽

10.1071/an18363 ◽

2020 ◽

Vol 60 (6) ◽

pp. 772

Author(s):

Francisco J. Jahuey-Martínez ◽

Gaspar M. Parra-Bracamonte ◽

Dorian J. Garrick ◽

Nicolás López-Villalobos ◽

Juan C. Martínez-González ◽

...

Keyword(s):

Genomic Prediction ◽

Reference Population ◽

Single Step ◽

Bayesian Regression ◽

Nucleotide Polymorphisms ◽

Linear Unbiased Prediction ◽

Genomic Breeding ◽

Breeding Values ◽

Charolais Cattle ◽

Best Linear Unbiased

Context Genomic prediction is now routinely used in many livestock species to rank individuals based on genomic breeding values (GEBV). Aims This study reports the first assessment aimed to evaluate the accuracy of direct GEBV for birth (BW) and weaning (WW) weights of registered Charolais cattle in Mexico. Methods The population assessed included 823 animals genotyped with an array of 77000 single nucleotide polymorphisms. Genomic prediction used genomic best linear unbiased prediction (GBLUP), Bayes C (BC), and single-step Bayesian regression (SSBR) methods in comparison with a pedigree-based BLUP method. Key results Our results show that the genomic prediction methods provided low and similar accuracies to BLUP. The prediction accuracy of GBLUP and BC were identical at 0.31 for BW and 0.29 for WW, similar to BLUP. Prediction accuracies of SSBR for BW and WW were up to 4% higher than those by BLUP. Conclusions Genomic prediction is feasible under current conditions, and provides a slight improvement using SSBR. Implications Some limitations on reference population size and structure were identified and need to be addressed to obtain more accurate predictions in liveweight traits under the prevalent cattle breeding conditions of Mexico.

Download Full-text