scholarly journals Comparison of Genomic Selection Models for Exploring Predictive Ability of Complex Traits in Breeding Programs

2021 ◽  
Author(s):  
Lance F. Merrick ◽  
Arron H. Carter

AbstractTraits with a complex unknown genetic architecture are common in breeding programs. However, they pose a challenge for selection due to a combination of complex environmental and pleiotropic effects that impede the ability to create mapping populations to characterize the trait’s genetic basis. One such trait, seedling emergence of wheat (Triticum aestivumL.) from deep planting, presents a unique opportunity to explore the best method to use and implement GS models to predict a complex trait. 17 GS models were compared using two training populations, consisting of 473 genotypes from a diverse association mapping panel (DP) phenotyped from 2015-2019 and the other training population consisting of 643 breeding lines phenotyped in 2015 and 2020 in Lind, WA with 40,368 markers. There were only a few significant differences between GS models, with support vector machines reaching the highest accuracy of 0.56 in a single breeding line trial using cross-validations. However, the consistent moderate accuracy of cBLUP and other parametric models indicates no need to implement computationally demanding non-parametric models for complex traits. There was an increase in accuracy using cross-validations from 0.40 to 0.41 and independent validations from 0.10 to 0.17 using diversity panels lines to breeding lines. The environmental effects of complex traits can be overcome by combining years of the same populations. Overall, our study showed that breeders can accurately predict and implement GS for a complex trait by using parametric models within their own breeding programs with increased accuracy as they combine training populations over the years.

2021 ◽  
Vol 12 ◽  
Author(s):  
Owen M. Powell ◽  
Kai P. Voss-Fels ◽  
David R. Jordan ◽  
Graeme Hammer ◽  
Mark Cooper

Genomic prediction of complex traits across environments, breeding cycles, and populations remains a challenge for plant breeding. A potential explanation for this is that underlying non-additive genetic (GxG) and genotype-by-environment (GxE) interactions generate allele substitution effects that are non-stationary across different contexts. Such non-stationary effects of alleles are either ignored or assumed to be implicitly captured by most gene-to-phenotype (G2P) maps used in genomic prediction. The implicit capture of non-stationary effects of alleles requires the G2P map to be re-estimated across different contexts. We discuss the development and application of hierarchical G2P maps that explicitly capture non-stationary effects of alleles and have successfully increased short-term prediction accuracy in plant breeding. These hierarchical G2P maps achieve increases in prediction accuracy by allowing intermediate processes such as other traits and environmental factors and their interactions to contribute to complex trait variation. However, long-term prediction remains a challenge. The plant breeding community should undertake complementary simulation and empirical experiments to interrogate various hierarchical G2P maps that connect GxG and GxE interactions simultaneously. The existing genetic correlation framework can be used to assess the magnitude of non-stationary effects of alleles and the predictive ability of these hierarchical G2P maps in long-term, multi-context genomic predictions of complex traits in plant breeding.


2021 ◽  
Vol 12 ◽  
Author(s):  
Vipin Tomar ◽  
Daljit Singh ◽  
Guriqbal Singh Dhillon ◽  
Yong Suk Chung ◽  
Jesse Poland ◽  
...  

Genomic selection (GS) has the potential to improve the selection gain for complex traits in crop breeding programs from resource-poor countries. The GS model performance in multi-environment (ME) trials was assessed for 141 advanced breeding lines under four field environments via cross-predictions. We compared prediction accuracy (PA) of two GS models with or without accounting for the environmental variation on four quantitative traits of significant importance, i.e., grain yield (GRYLD), thousand-grain weight, days to heading, and days to maturity, under North and Central Indian conditions. For each trait, we generated PA using the following two different ME cross-validation (CV) schemes representing actual breeding scenarios: (1) predicting untested lines in tested environments through the ME model (ME_CV1) and (2) predicting tested lines in untested environments through the ME model (ME_CV2). The ME predictions were compared with the baseline single-environment (SE) GS model (SE_CV1) representing a breeding scenario, where relationships and interactions are not leveraged across environments. Our results suggested that the ME models provide a clear advantage over SE models in terms of robust trait predictions. Both ME models provided 2–3 times higher prediction accuracies for all four traits across the four tested environments, highlighting the importance of accounting environmental variance in GS models. While the improvement in PA from SE to ME models was significant, the CV1 and CV2 schemes did not show any clear differences within ME, indicating the ME model was able to predict the untested environments and lines equally well. Overall, our results provide an important insight into the impact of environmental variation on GS in smaller breeding programs where these programs can potentially increase the rate of genetic gain by leveraging the ME wheat breeding trials.


2020 ◽  
Vol 98 (6) ◽  
Author(s):  
Anderson Antonio Carvalho Alves ◽  
Rebeka Magalhães da Costa ◽  
Tiago Bresolin ◽  
Gerardo Alves Fernandes Júnior ◽  
Rafael Espigolan ◽  
...  

Abstract The aim of this study was to compare the predictive performance of the Genomic Best Linear Unbiased Predictor (GBLUP) and machine learning methods (Random Forest, RF; Support Vector Machine, SVM; Artificial Neural Network, ANN) in simulated populations presenting different levels of dominance effects. Simulated genome comprised 50k SNP and 300 QTL, both biallelic and randomly distributed across 29 autosomes. A total of six traits were simulated considering different values for the narrow and broad-sense heritability. In the purely additive scenario with low heritability (h2 = 0.10), the predictive ability obtained using GBLUP was slightly higher than the other methods whereas ANN provided the highest accuracies for scenarios with moderate heritability (h2 = 0.30). The accuracies of dominance deviations predictions varied from 0.180 to 0.350 in GBLUP extended for dominance effects (GBLUP-D), from 0.06 to 0.185 in RF and they were null using the ANN and SVM methods. Although RF has presented higher accuracies for total genetic effect predictions, the mean-squared error values in such a model were worse than those observed for GBLUP-D in scenarios with large additive and dominance variances. When applied to prescreen important regions, the RF approach detected QTL with high additive and/or dominance effects. Among machine learning methods, only the RF was capable to cover implicitly dominance effects without increasing the number of covariates in the model, resulting in higher accuracies for the total genetic and phenotypic values as the dominance ratio increases. Nevertheless, whether the interest is to infer directly on dominance effects, GBLUP-D could be a more suitable method.


2020 ◽  
Vol 10 (12) ◽  
pp. 4599-4613
Author(s):  
Fabio Morgante ◽  
Wen Huang ◽  
Peter Sørensen ◽  
Christian Maltecca ◽  
Trudy F. C. Mackay

The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.


1999 ◽  
Vol 26 (2) ◽  
pp. 100-106 ◽  
Author(s):  
A. K. Culbreath ◽  
J. W. Todd ◽  
D. W. Gorbet ◽  
S. L. Brown ◽  
J. A. Baldwin ◽  
...  

Abstract Epidemics of tomato spotted wilt, caused by tomato spotted wilt Tospovirus (TSWV), were monitored in field plots of runner-type peanut (Arachis hypogaea L.) cultivars Georgia Green and Georgia Runner and numerous breeding lines from four different breeding programs as part of efforts toward characterizing breeding lines with potential for release as cultivars. Breeding lines were divided into early, medium and late maturity groups. The tests were conducted near Attapulgus, GA and Marianna, FL in 1997 and in Tifton, GA and Marianna, FL in 1998. Epidemics in some early and medium maturing breeding lines, including some genotypes with high oleic acid oil chemistry, were comparable to those in Georgia Green, the cultivar most frequently used in the southeastern U.S. for suppression of spotted wilt epidemics. No early maturing breeding lines had lower spotted wilt final intensity ratings or higher yields than Georgia Green. However, spotted wilt intensity ratings in some late maturing lines and a smaller number of medium maturing lines were significantly lower than those of Georgia Green. Several of those lines also produced greater pod yields than Georgia Green. Results from these experiments indicated that there is potential for improving management of spotted wilt though development of cultivars that suppress spotted wilt epidemics more than currently available cultivars. There was no indication that differences in spotted wilt ratings corresponded to differences in numbers of thrips adults or larvae.


2021 ◽  
Author(s):  
Lance F Merrick ◽  
Dennis N Lozada ◽  
Xianming Chen ◽  
Arron H Carter

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in four years (2016-2018, and 2020) and a diversity panel phenotyped in four years (2013-2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using rrBLUP and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Further, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.


2015 ◽  
Vol 2 ◽  
pp. 81 ◽  
Author(s):  
Gilberto Antonio Peripolli Bevilaqua ◽  
Iraja Ferreira Antunes

The common bean has been object of breeding programs aiming the development of new cultivars adapted to varied production system and shown differentiated nutritional characteristics. Due a genetic diversity existent the landraces can be used directly for cropping, for present characteristics desirable. Little information exists about mineral content and other quality traits for those bean landraces. The aim of this paper was to verify the variability for grain nutricional caracters in breeding cultivars and landraces of bean from Rio Grande do Sul state, Brazil. The experiment was conducted in 2009/2010 in Experimental Station Cascata, of Embrapa Temperate Agriculture. In whole grain of 54 bean genotypes with black and no black coat were determined macroelements (nitrogen, phosphorus, potash, calcium, magnesium and sulfur), oligoelements (iron, manganese, zinc and cuprum), protein and ash content, insoluble fiber, digestive nutrient and antioxidant astragalina. The results shown that the landraces varieties presents nutritional composition of macro and oligoelements, fibers, protein and ash contents in whole grain similar than that of breeding lines and cultivars. The black coat grain from breeding programs showed better nutritional quality for macro and oligoelements content than coloured grain, highlighting TB 02-04 e TB 01-01. The landraces with coloured grains TB 02-26, TB 02-24 and TB 03-13 showed the high levels of astragaline.


Sports ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 3
Author(s):  
Mauro Mandorino ◽  
António J. Figueiredo ◽  
Gianluca Cima ◽  
Antonio Tessitore

This study aimed to analyze different predictive analytic techniques to forecast the risk of muscle strain injuries (MSI) in youth soccer based on training load data. Twenty-two young soccer players (age: 13.5 ± 0.3 years) were recruited, and an injury surveillance system was applied to record all MSI during the season. Anthropometric data, predicted age at peak height velocity, and skeletal age were collected. The session-RPE method was daily employed to quantify internal training/match load, and monotony, strain, and cumulative load over the weeks were calculated. A countermovement jump (CMJ) test was submitted before and after each training/match to quantify players’ neuromuscular fatigue. All these data were used to predict the risk of MSI through different data mining models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM). Among them, SVM showed the best predictive ability (area under the curve = 0.84 ± 0.05). Then, Decision tree (DT) algorithm was employed to understand the interactions identified by the SVM model. The rules extracted by DT revealed how the risk of injury could change according to players’ maturity status, neuromuscular fatigue, anthropometric factors, higher workloads, and low recovery status. This approach allowed to identify MSI and the underlying risk factors.


2020 ◽  
Author(s):  
Miguel Pérez-Enciso ◽  
Laura M. Zingaretti ◽  
Yuliaxis Ramayo-Caldas ◽  
Gustavo de los Campos

AbstractThe analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: How useful can the microbiome be for complex trait prediction? Are microbiability estimates reliable? Can the underlying biological links between the host’s genome, microbiome, and the phenome be recovered? Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as input, and (ii) proposing a variance-component approach which, in the spirit of mediation analyses, quantifies the proportion of phenotypic variance explained by genome and microbiome, and dissects it into direct and indirect effects. The proposed simulation approach can mimic a genetic link between the microbiome and SNP data via a permutation procedure that retains the distributional properties of the data. Results suggest that microbiome data could significantly improve phenotype prediction accuracy, irrespective of whether some abundances are under direct genetic control by the host or not. Overall, random-effects linear methods appear robust for variance components estimation, despite the highly leptokurtic distribution of microbiota abundances. Nevertheless, we observed that accuracy depends in part on the number of microorganisms’ taxa influencing the trait of interest. While we conclude that overall genome-microbiome-links can be characterized via variance components, we are less optimistic about the possibility of identifying the causative effects, i.e., individual SNPs affecting abundances; power at this level would require much larger sample sizes than the ones typically available for genome-microbiome-phenome data.Author summaryThe microbiome consists of the microorganisms that live in a particular environment, including those in our organism. There is consistent evidence that these communities play an important role in numerous traits of relevance, including disease susceptibility or feed efficiency. Moreover, it has been shown that the microbiome can be relatively stable throughout an individual’s life and that is affected by the host genome. These reasons have prompted numerous studies to determine whether and how the microbiome can be used for prediction of complex phenotypes, either using microbiome alone or in combination with host’s genome data. However, numerous questions remain to be answered such as the reliability of parameter estimates, or which is the underlying relationship between microbiome, genome, and phenotype. The few available empirical studies do not provide a clear answer to these problems. Here we address these issues by developing a novel simulation strategy and we show that, although the microbiome can significantly help in prediction, it will be difficult to retrieve the actual biological basis of interactions between the microbiome and the trait.


Sign in / Sign up

Export Citation Format

Share Document