Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers

Hybrid wheat breeding is one of the most promising technologies for further sustainable yield increases. However, the cleistogamous nature of wheat displays a major bottleneck for a successful hybrid breeding program. Thus, an optimized breeding strategy by developing appropriate parental lines with favorable floral trait combinations is the best way to enhance the outcrossing ability. This study, therefore, aimed to dissect the genetic basis of various floral traits using genome-wide association study (GWAS) and to assess the potential of genome-wide prediction (GP) for anther extrusion (AE), visual anther extrusion (VAE), pollen mass (PM), pollen shedding (PSH), pollen viability (PV), anther length (AL), openness of the flower (OPF), duration of floret opening (DFO) and stigma length. To this end, we employed 196 ICARDA spring bread wheat lines evaluated for three years and genotyped with 10,477 polymorphic SNP. In total, 70 significant markers were identified associated to the various assessed traits at FDR ≤ 0.05 contributing a minor to large proportion of the phenotypic variance (8–26.9%), affecting the traits either positively or negatively. GWAS revealed multi-marker-based associations among AE, VAE, PM, OPF and DFO, most likely linked markers, suggesting a potential genomic region controlling the genetic association of these complex traits. Of these markers, Kukri_rep_c103359_233 and wsnp_Ex_rep_c107911_91350930 deserve particular attention. The consistently significant markers with large effect could be useful for marker-assisted selection. Genomic selection revealed medium to high prediction accuracy ranging between 52% and 92% for the assessed traits with the least and maximum value observed for stigma length and visual anther extrusion, respectively. This indicates the feasibility to implement genomic selection to predict the performance of hybrid floral traits with high reliability.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Resolving the Effects of Maternal and Offspring Genotype on Dyadic Outcomes in Genome Wide Complex Trait Analysis (“M-GCTA”)

Behavior Genetics ◽

10.1007/s10519-014-9666-6 ◽

2014 ◽

Vol 44 (5) ◽

pp. 445-455 ◽

Cited By ~ 40

Author(s):

Lindon J. Eaves ◽

Beate St. Pourcain ◽

George Davey Smith ◽

Timothy P. York ◽

David M. Evans

Keyword(s):

Complex Trait ◽

Trait Analysis ◽

Genome Wide ◽

Offspring Genotype

Download Full-text

Ideas in Genomic Selection that Transformed Plant Molecular Breeding: A Review

10.20944/preprints202010.0460.v1 ◽

2020 ◽

Author(s):

Matthew McGowan ◽

Zhiwu Zhang ◽

Jiabo Wang ◽

Haixiao Dong ◽

Xiaolei Liu ◽

...

Keyword(s):

Genomic Selection ◽

Genetic Markers ◽

Random Effects ◽

Genome Wide Association Study ◽

Single Step ◽

Hybrid Breeding ◽

Multiple Trait ◽

Underlying Distribution ◽

Genome Wide ◽

Before And After

Estimation of breeding values through Best Linear Unbiased Prediction (BLUP) using pedigree-based kinship and Marker-Assisted Selection (MAS) are the two fundamental breeding methods used before and after the introduction of genetic markers, respectively. The emergence of high-density genome-wide markers has led to the development of two parallel series of approaches inspired by BLUP and MAS, which are collectively referred to as Genomic Selection (GS). The first series of GS methods alters pedigree-based BLUP by replacing pedigree-based kinship with marker-based kinship in a variety of ways, including weighting markers by their effects in genome-wide association study (GWAS), joining both pedigree and marker-based kinship together in a single-step BLUP, and substituting individuals with groups in a compressed BLUP. The second series of GS methods estimates the effects for all genetic markers simultaneously. For the second series methods, the marker effects are summed together regardless of their individual significance. Instead of fitting individuals as random effects like in the BLUP series, the second series fits markers as random effects. Differing assumptions regarding the underlying distribution of these marker effects have resulted in the development of many Bayesian-based GS methods. This review highlights critical concept developments for both of these series and explores ongoing GS developments in machine learning, multiple trait selection, and adaptation for hybrid breeding. Furthermore, considering the increasing use and variety of GS methods in plant breeding programs, this review addresses important concerns for future GS development and application, such as the use of GWAS-assisted GS, the long-term effectiveness of GS methods, and the valid assessment of prediction accuracy.

Download Full-text

Genome-wide association study and genomic selection for plant height, maturity, seed weight, and yield in soybean

10.21203/rs.2.17481/v1 ◽

2019 ◽

Author(s):

Waltram Ravelombola ◽

Jun Qin ◽

Ainong Shi ◽

Fengmin Wang ◽

Yan Feng ◽

...

Keyword(s):

Association Study ◽

Genomic Selection ◽

Candidate Genes ◽

Plant Height ◽

Seed Weight ◽

Genome Wide Association Study ◽

Agronomic Traits ◽

Snp Markers ◽

Genome Wide Association ◽

Genome Wide

Abstract Background Soybean [ Glycine max (L.) Merr.] is a legume of great interest worldwide. Enhancing genetic gain for agronomic traits via molecular approaches has been long considered as the main task for soybean breeders and geneticists. The objectives of this study were to evaluate maturity, plant height, seed weight, and yield in a diverse soybean accession panel, to conduct a genome-wide association study (GWAS) for these traits and identify SNP markers associated with the four traits, and to assess genomic selection (GS) accuracy. Results A total of 250 soybean accessions were evaluated for maturity, plant height, seed weight, and yield over three years. This panel was genotyped with a total of 10,259 high quality SNPs postulated from genotyping by sequencing (GBS). GWAS was performed using a Bayesian Information and Linkage Disequilibrium Iteratively Nested Keyway (BLINK) model, and GS was evaluated using a ridge regression best linear unbiased predictor (rrBLUP) model. The results revealed that a total of 20, 31, 37, 31, and 23 SNPs were significantly associated with the average 3-year data for maturity, plant height, seed weight, and yield, respectively; some significant SNPs were mapped into previously described loci ( E2 , E4 , and Dt1 ) affecting maturity and plant height in soybean and a new locus mapped on chromosome 20 was significantly associated with plant height; Glyma.10g228900 , Glyma.19g200800 , Glyma.09g196700 , and Glyma.09g038300 were candidate genes found in the vicinity of the top or the second best SNP for maturity, plant height, seed weight, and yield, respectively; a 11.5-Mb region of chromosome 10 was associated with both seed weight and yield; and GS accuracy was trait-, year-, and population structure-dependent. Conclusions The SNP markers identified from this study for plant height, maturity, seed weight and yield can be used to improve the four agronomic traits through marker-assisted selection (MAS) and GS in soybean breeding programs. After validation, the candidate genes can be transferred to new cultivars using SNP markers through MAS. The high GS accuracy has confirmed that the four agronomic traits can be selected in molecular breeding through GS.

Download Full-text

Opportunities and limits of combining microbiome and genome data for complex trait prediction

10.1101/2020.10.05.325977 ◽

2020 ◽

Author(s):

Miguel Pérez-Enciso ◽

Laura M. Zingaretti ◽

Yuliaxis Ramayo-Caldas ◽

Gustavo de los Campos

Keyword(s):

Complex Traits ◽

Variance Components ◽

Complex Trait ◽

Parameter Estimates ◽

Phenotypic Variance ◽

Genome Data ◽

Individual Snps ◽

Simulation Strategy ◽

Microbiome Data ◽

Trait Prediction

AbstractThe analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: How useful can the microbiome be for complex trait prediction? Are microbiability estimates reliable? Can the underlying biological links between the host’s genome, microbiome, and the phenome be recovered? Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as input, and (ii) proposing a variance-component approach which, in the spirit of mediation analyses, quantifies the proportion of phenotypic variance explained by genome and microbiome, and dissects it into direct and indirect effects. The proposed simulation approach can mimic a genetic link between the microbiome and SNP data via a permutation procedure that retains the distributional properties of the data. Results suggest that microbiome data could significantly improve phenotype prediction accuracy, irrespective of whether some abundances are under direct genetic control by the host or not. Overall, random-effects linear methods appear robust for variance components estimation, despite the highly leptokurtic distribution of microbiota abundances. Nevertheless, we observed that accuracy depends in part on the number of microorganisms’ taxa influencing the trait of interest. While we conclude that overall genome-microbiome-links can be characterized via variance components, we are less optimistic about the possibility of identifying the causative effects, i.e., individual SNPs affecting abundances; power at this level would require much larger sample sizes than the ones typically available for genome-microbiome-phenome data.Author summaryThe microbiome consists of the microorganisms that live in a particular environment, including those in our organism. There is consistent evidence that these communities play an important role in numerous traits of relevance, including disease susceptibility or feed efficiency. Moreover, it has been shown that the microbiome can be relatively stable throughout an individual’s life and that is affected by the host genome. These reasons have prompted numerous studies to determine whether and how the microbiome can be used for prediction of complex phenotypes, either using microbiome alone or in combination with host’s genome data. However, numerous questions remain to be answered such as the reliability of parameter estimates, or which is the underlying relationship between microbiome, genome, and phenotype. The few available empirical studies do not provide a clear answer to these problems. Here we address these issues by developing a novel simulation strategy and we show that, although the microbiome can significantly help in prediction, it will be difficult to retrieve the actual biological basis of interactions between the microbiome and the trait.

Download Full-text

Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa (Medicago sativa L.)

Cells ◽

10.3390/cells10123372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 3372

Author(s):

Cesar A. Medina ◽

Harpreet Kaur ◽

Ian Ray ◽

Long-Xi Yu

Keyword(s):

Salt Stress ◽

Abiotic Stress ◽

Medicago Sativa ◽

Genomic Selection ◽

Complex Traits ◽

Prediction Accuracy ◽

Breeding Value ◽

Phenotypic Traits ◽

Genome Wide ◽

Medicago Sativa L

Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.

Download Full-text