Leveraging Transcriptomics Data for Genomic Prediction Models in Cassava

AbstractBackgroundGenomic prediction models were, in principle, developed to include all the available marker information; with this approach, these models have shown in various crops moderate to high predictive accuracies. Previous studies in cassava have demonstrated that, even with relatively small training populations and low-density GBS markers, prediction models are feasible for genomic selection. In the present study, we prioritized SNPs in close proximity to genome regions with biological importance for a given trait. We used a number of strategies to select variants that were then included in single and multiple kernel GBLUP models. Specifically, our sources of information were transcriptomics, GWAS, and immunity-related genes, with the ultimate goal to increase predictive accuracies for Cassava Brown Streak Disease (CBSD) severity.ResultsWe used single and multi-kernel GBLUP models with markers imputed to whole genome sequence level to accommodate various sources of biological information; fitting more than one kinship matrix allowed for differential weighting of the individual marker relationships. We applied these GBLUP approaches to CBSD phenotypes (i.e., root infection and leaf severity three and six months after planting) in a Ugandan Breeding Population (n = 955). Three means of exploiting an established RNAseq experiment of CBSD-infected cassava plants were used. Compared to the biology-agnostic GBLUP model, the accuracy of the informed multi-kernel models increased the prediction accuracy only marginally (1.78% to 2.52%).ConclusionsOur results show that markers imputed to whole genome sequence level do not provide enhanced prediction accuracies compared to using standard GBS marker data in cassava. The use of transcriptomics data and other sources of biological information resulted in prediction accuracies that were nominally superior to those obtained from traditional prediction models.

Download Full-text

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL

Genetics Selection Evolution ◽

10.1186/s12711-021-00607-4 ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Theo Meuwissen ◽

Irene van den Berg ◽

Mike Goddard

Keyword(s):

Variable Selection ◽

Genome Sequence ◽

Genomic Prediction ◽

Milk Fat ◽

Genotype Imputation ◽

Whole Genome Sequence ◽

Genomic Relationship Matrix ◽

Polygenic Effect ◽

Relationship Matrix ◽

Whole Genome

Abstract Background Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. Methods The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. Results The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. Conclusions Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.

Download Full-text

Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations

Genetics Selection Evolution ◽

10.1186/s12711-019-0514-2 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 6

Author(s):

Nasir Moghaddar ◽

Majid Khansefid ◽

Julius H. J. van der Werf ◽

Sunduimijid Bolormaa ◽

Naomi Duijvesteijn ◽

...

Keyword(s):

Genome Sequence ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Sequence Data ◽

Whole Genome Sequence ◽

Sequence Variants ◽

Whole Genome ◽

Absolute Increase ◽

Genome Sequence Data ◽

Australian Sheep

Abstract Background Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. Methods Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. Results A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. Conclusions Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.

Download Full-text

Utility of whole-genome sequence data for across-breed genomic prediction

Genetics Selection Evolution ◽

10.1186/s12711-018-0396-8 ◽

2018 ◽

Vol 50 (1) ◽

Cited By ~ 21

Author(s):

Biaty Raymond ◽

Aniek C. Bouwman ◽

Chris Schrooten ◽

Jeanine Houwing-Duistermaat ◽

Roel F. Veerkamp

Keyword(s):

Genome Sequence ◽

Genomic Prediction ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Accuracy of genomic prediction using imputed whole-genome sequence data in white layers

Journal of Animal Breeding and Genetics ◽

10.1111/jbg.12199 ◽

2016 ◽

Vol 133 (3) ◽

pp. 167-179 ◽

Cited By ~ 24

Author(s):

M. Heidaritabar ◽

M.P.L. Calus ◽

H-J. Megens ◽

A. Vereijken ◽

M.A.M. Groenen ◽

...

Keyword(s):

Genome Sequence ◽

Genomic Prediction ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection

Genetics Selection Evolution ◽

10.1186/s12711-016-0225-x ◽

2016 ◽

Vol 48 (1) ◽

Cited By ~ 20

Author(s):

Mario P. L. Calus ◽

Aniek C. Bouwman ◽

Chris Schrooten ◽

Roel F. Veerkamp

Keyword(s):

Variable Selection ◽

Genome Sequence ◽

Genomic Prediction ◽

Sequence Data ◽

Bayesian Variable Selection ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data ◽

Split And Merge

Download Full-text

Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep

Genetics Selection Evolution ◽

10.1186/s12711-019-0476-4 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 6

Author(s):

Mohammad Al Kalaldeh ◽

John Gibson ◽

Naomi Duijvesteijn ◽

Hans D. Daetwyler ◽

Iona MacLeod ◽

...

Keyword(s):

Genome Sequence ◽

Genomic Prediction ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Parasite Resistance ◽

Genome Sequence Data ◽

Australian Sheep

Download Full-text

Genomic Prediction Using LD-Based Haplotypes Inferred From High-Density Chip and Imputed Sequence Variants in Chinese Simmental Beef Cattle

Frontiers in Genetics ◽

10.3389/fgene.2021.665382 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hongwei Li ◽

Bo Zhu ◽

Ling Xu ◽

Zezhao Wang ◽

Lei Xu ◽

...

Keyword(s):

Beef Cattle ◽

Genomic Prediction ◽

Whole Genome Sequence ◽

Data Sets ◽

Whole Genome ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Individual Single Nucleotide Polymorphism ◽

Improve Accuracy ◽

The Individual

A haplotype is defined as a combination of alleles at adjacent loci belonging to the same chromosome that can be transmitted as a unit. In this study, we used both the Illumina BovineHD chip (HD chip) and imputed whole-genome sequence (WGS) data to explore haploblocks and assess haplotype effects, and the haploblocks were defined based on the different LD thresholds. The accuracies of genomic prediction (GP) for dressing percentage (DP), meat percentage (MP), and rib eye roll weight (RERW) based on haplotype were investigated and compared for both data sets in Chinese Simmental beef cattle. The accuracies of GP using the entire imputed WGS data were lower than those using the HD chip data in all cases. For DP and MP, the accuracy of GP using haploblock approaches outperformed the individual single nucleotide polymorphism (SNP) approach (GBLUP_In_Block) at specific LD levels. Hotelling’s test confirmed that GP using LD-based haplotypes from WGS data can significantly increase the accuracies of GP for RERW, compared with the individual SNP approach (∼1.4 and 1.9% for GHBLUP and GHBLUP+GBLUP, respectively). We found that the accuracies using haploblock approach varied with different LD thresholds. The LD thresholds (r2 ≥ 0.5) were optimal for most scenarios. Our results suggested that LD-based haploblock approach can improve accuracy of genomic prediction for carcass traits using both HD chip and imputed WGS data under the optimal LD thresholds in Chinese Simmental beef cattle.

Download Full-text