Genomic Prediction of Drought Tolerance during Seedling Stage in Maize using Low-Cost Markers

Abstract Drought tolerance in maize is a complex and polygenic trait, especially in the seedling stage. In plant breeding, such traits can be improved by genomic selection (GS), which has become a practical and effective tool. In the present study, a natural maize population named Northeast China core population (NCCP) consisting of 379 inbred lines were genotyped with diversity arrays technology (DArT) and genotyping-by-sequencing (GBS) platforms. Target traits of seedling emergence rate (ER), seedling plant height (SPH), and grain yield (GY) were evaluated under two natural drought environments in northeast China. adequate genetic variants have been found for genomic selection, they are not stable enough between two years. Similarly, the heritability of the three traits is not stable enough, and the heritabilities in 2019 (0.88, 0.82, 0.85 for ER, SPH, GY) are higher than that in 2020 (0.65, 0.53, 0.33) and cross-two-year (0.32, 0.26, 0.33). The current research obtained two kinds of marker sets: the SilicoDArT markers were from DArT-seq, and SNPs were from the GBS and DArT-seq. In total, a number of 11,865 SilicoDArT, 7,837 DArT's SNPs, and 91,003 GBS SNPs were used for analysis after quality control. The results of phylogenetic trees showed that the population was rich in consanguinity. Genomic prediction results showed that the average prediction accuracies estimated using the DArT SNP dataset under the 2-fold cross-validation scheme were 0.27, 0.19, and 0.33, for ER, SPH, and GY, respectively. The result of SilicoDArT is close to the SNPs from DArT-seq, those were 0.26, 0.22, and 0.33. For SPH, the prediction accuracies using SilicoDArT were more than ones using DArT SNP, In some cases, alignment to the reference genome results in a loss to the prediction. The trait with lower heritability can improve the prediction accuracy using filtering of linkage disequilibrium. For the same trait, the prediction accuracy estimated with two types of DArT markers was consistently higher than those estimated with the GBS SNPs under the same genotyping cost. Our results show the prediction accuracy has been improved in some cases of controlling population structure and marker quality, even when the density of the marker is reduced. In the initial maize breeding cycle, Silicodart markers can obtain higher prediction accuracy with a lower cost. However, higher marker density platforms i.e. GBS may play a role in the following breeding cycle for the long term. The natural drought experimental station can reduce the difficulty of phenotypic identification in a water-scarce environment. The accumulation of more yearly data will help to stabilize the heritability and improve predictive accuracy in maize breeding. The experimental design and model for drought resistance also need to be further developed.

Download Full-text

Maximizing efficiency of genomic selection in CIMMYT’s tropical maize breeding program

Theoretical and Applied Genetics ◽

10.1007/s00122-020-03696-9 ◽

2020 ◽

Author(s):

Sikiru Adeniyi Atanda ◽

Michael Olsen ◽

Juan Burgueño ◽

Jose Crossa ◽

Daniel Dzidzienyo ◽

...

Keyword(s):

Genomic Selection ◽

Prediction Accuracy ◽

Large Scale ◽

Primary Objective ◽

Breeding Program ◽

Breeding Cycle ◽

Training Set ◽

Maize Breeding ◽

Phenotypic Data ◽

Breeding Programs

Abstract Key message Historical data from breeding programs can be efficiently used to improve genomic selection accuracy, especially when the training set is optimized to subset individuals most informative of the target testing set. Abstract The current strategy for large-scale implementation of genomic selection (GS) at the International Maize and Wheat Improvement Center (CIMMYT) global maize breeding program has been to train models using information from full-sibs in a “test-half-predict-half approach.” Although effective, this approach has limitations, as it requires large full-sib populations and limits the ability to shorten variety testing and breeding cycle times. The primary objective of this study was to identify optimal experimental and training set designs to maximize prediction accuracy of GS in CIMMYT’s maize breeding programs. Training set (TS) design strategies were evaluated to determine the most efficient use of phenotypic data collected on relatives for genomic prediction (GP) using datasets containing 849 (DS1) and 1389 (DS2) DH-lines evaluated as testcrosses in 2017 and 2018, respectively. Our results show there is merit in the use of multiple bi-parental populations as TS when selected using algorithms to maximize relatedness between the training and prediction sets. In a breeding program where relevant past breeding information is not readily available, the phenotyping expenditure can be spread across connected bi-parental populations by phenotyping only a small number of lines from each population. This significantly improves prediction accuracy compared to within-population prediction, especially when the TS for within full-sib prediction is small. Finally, we demonstrate that prediction accuracy in either sparse testing or “test-half-predict-half” can further be improved by optimizing which lines are planted for phenotyping and which lines are to be only genotyped for advancement based on GP.

Download Full-text

Genomic Prediction and Genetic Correlation of Agronomic, Blackleg Disease, and Seed Quality Traits in Canola (Brassica napus L.)

Plants ◽

10.3390/plants9060719 ◽

2020 ◽

Vol 9 (6) ◽

pp. 719

Author(s):

Mulusew Fikere ◽

Denise M. Barbulescu ◽

M. Michelle Malmberg ◽

Pankaj Maharjan ◽

Phillip A. Salisbury ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Seed Quality ◽

Agronomic Traits ◽

Genetic Correlations ◽

Quality Traits ◽

Blackleg Disease ◽

Genetic Progress ◽

Seed Quality Traits

Genomic selection accelerates genetic progress in crop breeding through the prediction of future phenotypes of selection candidates based on only their genomic information. Here we report genetic correlations and genomic prediction accuracies in 22 agronomic, disease, and seed quality traits measured across multiple years (2015–2017) in replicated trials under rain-fed and irrigated conditions in Victoria, Australia. Two hundred and two spring canola lines were genotyped for 62,082 Single Nucleotide Polymorphisms (SNPs) using transcriptomic genotype-by-sequencing (GBSt). Traits were evaluated in single trait and bivariate genomic best linear unbiased prediction (GBLUP) models and cross-validation. GBLUP were also expanded to include genotype-by-environment G × E interactions. Genomic heritability varied from 0.31to 0.66. Genetic correlations were highly positive within traits across locations and years. Oil content was positively correlated with most agronomic traits. Strong, not previously documented, negative correlations were observed between average internal infection (a measure of blackleg disease) and arachidic and stearic acids. The genetic correlations between fatty acid traits followed the expected patterns based on oil biosynthesis pathways. Genomic prediction accuracy ranged from 0.29 for emergence count to 0.69 for seed yield. The incorporation of G × E translates into improved prediction accuracy by up to 6%. The genomic prediction accuracies achieved indicate that genomic selection is ready for application in canola breeding.

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding

Frontiers in Plant Science ◽

10.3389/fpls.2020.592977 ◽

2020 ◽

Vol 11 ◽

Author(s):

Christian R. Werner ◽

R. Chris Gaynor ◽

Gregor Gorjanc ◽

John M. Hickey ◽

Tobias Kox ◽

...

Keyword(s):

Family Structure ◽

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Careful Analysis ◽

Critical Approach ◽

Crop Species ◽

Breeding Programs ◽

Mendelian Sampling

Over the last two decades, the application of genomic selection has been extensively studied in various crop species, and it has become a common practice to report prediction accuracies using cross validation. However, genomic prediction accuracies obtained from random cross validation can be strongly inflated due to population or family structure, a characteristic shared by many breeding populations. An understanding of the effect of population and family structure on prediction accuracy is essential for the successful application of genomic selection in plant breeding programs. The objective of this study was to make this effect and its implications for practical breeding programs comprehensible for breeders and scientists with a limited background in quantitative genetics and genomic selection theory. We, therefore, compared genomic prediction accuracies obtained from different random cross validation approaches and within-family prediction in three different prediction scenarios. We used a highly structured population of 940 Brassica napus hybrids coming from 46 testcross families and two subpopulations. Our demonstrations show how genomic prediction accuracies obtained from among-family predictions in random cross validation and within-family predictions capture different measures of prediction accuracy. While among-family prediction accuracy measures prediction accuracy of both the parent average component and the Mendelian sampling term, within-family prediction only measures how accurately the Mendelian sampling term can be predicted. With this paper we aim to foster a critical approach to different measures of genomic prediction accuracy and a careful analysis of values observed in genomic selection experiments and reported in literature.

Download Full-text

Optimizing Low-Cost Genotyping and Imputation Strategies for Genomic Selection in Atlantic Salmon

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400800 ◽

2019 ◽

Vol 10 (2) ◽

pp. 581-590 ◽

Cited By ~ 4

Author(s):

Smaragda Tsairidou ◽

Alastair Hamilton ◽

Diego Robledo ◽

James E. Bron ◽

Ross D. Houston

Keyword(s):

Atlantic Salmon ◽

Genomic Selection ◽

Environmental Sustainability ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Imputation Accuracy ◽

Cost Effective ◽

High Density ◽

Genotype Imputation ◽

Breeding Programs

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

Download Full-text

Bayesian genomic models boost prediction accuracy for survival to Streptococcus agalactiae infection in Nile tilapia (Oreochromus nilioticus)

Genetics Selection Evolution ◽

10.1186/s12711-021-00629-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Rajesh Joshi ◽

Anders Skaarud ◽

Alejandro Tola Alvarez ◽

Thomas Moen ◽

Jørgen Ødegård

Keyword(s):

Genomic Selection ◽

Streptococcus Agalactiae ◽

Nile Tilapia ◽

Prediction Accuracy ◽

Predictive Accuracy ◽

Challenge Test ◽

Bacterial Disease ◽

Nucleotide Polymorphisms ◽

Genomic Selection Model ◽

Selection For

Abstract Background Streptococcosis is a major bacterial disease in Nile tilapia that is caused by Streptococcus agalactiae infection, and development of resistant strains of Nile tilapia represents a sustainable approach towards combating this disease. In this study, we performed a controlled disease trial on 120 full-sib families to (i) quantify and characterize the potential of genomic selection for survival to S. agalactiae infection in Nile tilapia, and (ii) identify the best genomic model and the optimal density of single nucleotide polymorphisms (SNPs) for this trait. Methods In total, 40 fish per family (15 fish intraperitoneally injected and 25 fish as cohabitants) were used in the challenge test. Mortalities were recorded every 3 h for 35 days. After quality control, genotypes (50,690 SNPs) and phenotypes (0 for dead and 1 for alive) for 2472 cohabitant fish were available. Genetic parameters were obtained using various genomic selection models (genomic best linear unbiased prediction (GBLUP), BayesB, BayesC, BayesR and BayesS) and a traditional pedigree-based model (PBLUP). The pedigree-based analysis used a deep 17-generation pedigree. Prediction accuracy and bias were evaluated using five replicates of tenfold cross-validation. The genomic models were further analyzed using 10 subsets of SNPs at different densities to explore the effect of pruning and SNP density on predictive accuracy. Results Moderate estimates of heritabilities ranging from 0.15 ± 0.03 to 0.26 ± 0.05 were obtained with the different models. Compared to a pedigree-based model, GBLUP (using all the SNPs) increased prediction accuracy by 15.4%. Furthermore, use of the most appropriate Bayesian genomic selection model and SNP density increased the prediction accuracy up to 71%. The 40 to 50 SNPs with non-zero effects were consistent for all BayesB, BayesC and BayesS models with respect to marker id and/or marker locations. Conclusions These results demonstrate the potential of genomic selection for survival to S. agalactiae infection in Nile tilapia. Compared to the PBLUP and GBLUP models, Bayesian genomic models were found to boost the prediction accuracy significantly.

Download Full-text

A classic approach for determining genomic prediction accuracy under terminal drought stress and well-watered conditions in wheat landraces and cultivars

PLoS ONE ◽

10.1371/journal.pone.0247824 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0247824

Author(s):

Morteza Shabannejad ◽

Mohammad-Reza Bihamta ◽

Eslam Majidi-Hervan ◽

Hadi Alipour ◽

Asa Ebrahimi

Keyword(s):

Drought Stress ◽

Genomic Selection ◽

Bread Wheat ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Terminal Drought ◽

Genome Wide ◽

Best Linear Unbiased ◽

Terminal Drought Stress ◽

Trait Associations

The present study aimed to improve the accuracy of genomic prediction of 16 agronomic traits in a diverse bread wheat (Triticum aestivum L.) germplasm under terminal drought stress and well-watered conditions in semi-arid environments. An association panel including 87 bread wheat cultivars and 199 landraces from Iran bread wheat germplasm was planted under two irrigation systems in semi-arid climate zones. The whole association panel was genotyped with 9047 single nucleotide polymorphism markers using the genotyping-by-sequencing method. A number of 23 marker-trait associations were selected for traits under each condition, whereas 17 marker-trait associations were common between terminal drought stress and well-watered conditions. The identified marker-trait associations were mostly single nucleotide polymorphisms with minor allele effects. This study examined the effect of population structure, genomic selection method (ridge regression-best linear unbiased prediction, genomic best-linear unbiased predictions, and Bayesian ridge regression), training set size, and type of marker set on genomic prediction accuracy. The prediction accuracies were low (-0.32) to moderate (0.52). A marker set including 93 significant markers identified through genome-wide association studies with P values ≤ 0.001 increased the genomic prediction accuracy for all traits under both conditions. This study concluded that obtaining the highest genomic prediction accuracy depends on the extent of linkage disequilibrium, the genetic architecture of trait, genetic diversity of the population, and the genomic selection method. The results encouraged the integration of genome-wide association study and genomic selection to enhance genomic prediction accuracy in applied breeding programs.

Download Full-text

Optimizing genomic prediction of host resistance to koi herpesvirus disease in carp

10.1101/609784 ◽

2019 ◽

Author(s):

Christos Palaiokostas ◽

Tomas Vesely ◽

Martin Kocour ◽

Martin Prchal ◽

Dagmar Pokorova ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Genetic Relationships ◽

Animal Health ◽

Economic Losses ◽

The European Union ◽

Koi Herpesvirus ◽

Wide Range ◽

Close Relatives

AbstractGenomic selection (GS) is increasingly applied in breeding programmes of major aquaculture species, enabling improved prediction accuracy and genetic gain compared to pedigree-based approaches. Koi Herpesvirus disease (KHVD) is notifiable by the World Organisation for Animal Health and the European Union, causing major economic losses to carp production. Genomic selection has potential to breed carp with improved resistance to KHVD, thereby contributing to disease control. In the current study, Restriction-site Associated DNA sequencing (RAD-seq) was applied on a population of 1,425 common carp juveniles which had been challenged with Koi herpes virus, followed by sampling of survivors and mortalities. Genomic selection (GS) was tested on a wide range of scenarios by varying both SNP densities and the genetic relationships between training and validation sets. The accuracy of correctly identifying KHVD resistant animals using genomic selection was between 8 and 18 % higher than pedigree best linear unbiased predictor (pBLUP) depending on the tested scenario. Furthermore, minor decreases in prediction accuracy were observed with decreased SNP density. However, the genetic relationship between the training and validation sets was a key factor in the efficacy of genomic prediction of KHVD resistance in carp, with substantially lower prediction accuracy when the relationships between the training and validation sets did not contain close relatives.

Download Full-text

Heterotic quantitative trait loci analysis and genomic prediction of seedling biomass-related traits in maize triple testcross populations

Plant Methods ◽

10.1186/s13007-021-00785-8 ◽

2021 ◽

Vol 17 (1) ◽

Author(s):

Tifu Zhang ◽

Lu Jiang ◽

Long Ruan ◽

Yiliang Qian ◽

Shuaiqiang Liang ◽

...

Keyword(s):

Quantitative Trait Loci ◽

Genomic Prediction ◽

Quantitative Trait ◽

Prediction Accuracy ◽

Dry Weight ◽

Leaf Length ◽

Leaf Width ◽

Maize Breeding ◽

Seedling Biomass ◽

Trait Loci

Abstract Background Heterosis has been widely used in maize breeding. However, we know little about the heterotic quantitative trait loci and their roles in genomic prediction. In this study, we sought to identify heterotic quantitative trait loci for seedling biomass-related traits using triple testcross design and compare their prediction accuracies by fitting molecular markers and heterotic quantitative trait loci. Results A triple testcross population comprised of 366 genotypes was constructed by crossing each of 122 intermated B73 × Mo17 genotypes with B73, Mo17, and B73 × Mo17. The mid-parent heterosis of seedling biomass-related traits involved in leaf length, leaf width, leaf area, and seedling dry weight displayed a large range, from less than 50 to ~ 150%. Relationships between heterosis of seedling biomass-related traits showed congruency with that between performances. Based on a linkage map comprised of 1631 markers, 14 augmented additive, two augmented dominance, and three dominance × additive epistatic quantitative trait loci for heterosis of seedling biomass-related traits were identified, with each individually explaining 4.1–20.5% of the phenotypic variation. All modes of gene action, i.e., additive, partially dominant, dominant, and overdominant modes were observed. In addition, ten additive × additive and six dominance × dominance epistatic interactions were identified. By implementing the general and special combining ability model, we found that prediction accuracy ranged from 0.29 for leaf length to 0.56 for leaf width. Different number of marker analysis showed that ~ 800 markers almost capture the largest prediction accuracies. When incorporating the heterotic quantitative trait loci into the model, we did not find the significant change of prediction accuracy, with only leaf length showing the marginal improvement by 1.7%. Conclusions Our results demonstrated that the triple testcross design is suitable for detecting heterotic quantitative trait loci and evaluating the prediction accuracy. Seedling leaf width can be used as the representative trait for seedling prediction. The heterotic quantitative trait loci are not necessary for genomic prediction of seedling biomass-related traits.

Download Full-text

98 Using differential evolution to improve predictive accuracy of deep learning models applied to pig production data

Journal of Animal Science ◽

10.1093/jas/skaa054.048 ◽

2020 ◽

Vol 98 (Supplement_3) ◽

pp. 27-27

Author(s):

Junjie Han ◽

Cedric Gondro ◽

Juan Steibel

Keyword(s):

Deep Learning ◽

Differential Evolution ◽

Image Classification ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Predictive Accuracy ◽

Traditional Approach ◽

Exhaustive Search ◽

Pig Production ◽

Hyperparameter Selection

Abstract Deep learning (DL) is being used for prediction in precision livestock farming and in genomic prediction. However, optimizing hyperparameters in DL models is critical for their predictive performance. Grid search is the traditional approach to select hyperparameters in DL, but it requires exhaustive search over the parameter space. We propose hyperparameter selection using differential evolution (DE), which is a heuristic algorithm that does not require exhaustive search. The goal of this study was to design and apply DE to optimize hyperparameters of DL models for genomic prediction and image analysis in pig production systems. One dataset consisted of 910 pigs genotyped with 28,916 SNP markers to predict their post-mortem meat pH. Another dataset consisted of 1,334 images of pigs eating inside a single-spaced feeder classified as: “single pig” or “multiple pigs.” The accuracy of genomic prediction was defined as the correlation between the predicted pH and the observed pH. The image classification prediction accuracy was the proportion of correctly classified images. For genomic prediction, a multilayer perceptron (MLP) was optimized. For image classification, MLP and convolutional neural networks (CNN) were optimized. For genomic prediction, the initial hyperparameter set resulted in an accuracy of 0.032 and for image classification, the initial accuracy was between 0.72 and 0.76. After optimization using DE, the genomic prediction accuracy was 0.3688 compared to 0.334 using GBLUP. The top selected models included one layer, 60 neurons, sigmoid activation and L2 penalty = 0.3. The accuracy of image classification after optimization was between 0.89 and 0.92. Selected models included three layers, adamax optimizer and relu or elu activation for the MLP, and one layer, 64 filters and 5×5 filter size for the CNN. DE can adapt the hyperparameter selection to each problem, dataset and model, and it significantly increased prediction accuracy with minimal user input.

Download Full-text