Genome-enabled prediction models for black tea (Camellia sisnesnsis) quality and drought tolerance traits

Mapping Intimacies ◽

10.1101/850792 ◽

2019 ◽

Author(s):

Robert. K. Koech ◽

Pelly M. Malebe ◽

Christopher Nyarukowa ◽

Richard Mose ◽

Samson M. Kamunya ◽

...

Keyword(s):

Drought Tolerance ◽

Genomic Selection ◽

Prediction Models ◽

Black Tea ◽

Future Application ◽

Phenotypic Traits ◽

List Type ◽

Putative Qtls ◽

Validation Population ◽

Prediction Approach

SummaryGenomic selection in tea (Camellia sinensis) breeding has the potential to accelerate efficiency of choosing parents with desirable traits at the seedling stage.The study evaluated different genome-enabled prediction models for black tea quality and drought tolerance traits in discovery and validation populations. The discovery population comprised of two segregating tea populations (TRFK St. 504 and TRFK St. 524) with 255 F1 progenies and 56 individual tea cultivars in validation population genotyped using 1 421 DArTseq markers.Two-fold cross-validation was used for training the prediction models in discovery population, and the best prediction models were consequently, fitted to the validation population.Of all the four based prediction approaches, putative QTLs (Quantitative Trait Loci) + annotated proteins + KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathway-based prediction approach, showed robustness and usefulness in prediction of phenotypes.Extreme Learning Machine model had better prediction ability for catechin, astringency, brightness, briskness, and colour based on putative QTLs + annotated proteins + KEGG pathway approach.The percent variables of importance of putatively annotated proteins and KEGG pathways were associated with the phenotypic traits. The findings has for the first time opened up a new avenue for future application of genomic selection in tea breeding.

Genome‐enabled prediction models for black tea ( Camellia sinensis ) quality and drought tolerance traits

Plant Breeding ◽

10.1111/pbr.12813 ◽

2020 ◽

Vol 139 (5) ◽

pp. 1003-1015

Author(s):

Robert K. Koech ◽

Pelly M. Malebe ◽

Christopher Nyarukowa ◽

Richard Mose ◽

Samson M. Kamunya ◽

...

Keyword(s):

Drought Tolerance ◽

Camellia Sinensis ◽

Prediction Models ◽

Black Tea

Combined linkage and association mapping of putative QTLs controlling black tea quality and drought tolerance traits

Euphytica ◽

10.1007/s10681-019-2483-5 ◽

2019 ◽

Vol 215 (10) ◽

Cited By ~ 2

Author(s):

Robert. K. Koech ◽

Richard Mose ◽

Samson M. Kamunya ◽

Zeno Apostolides

Keyword(s):

Drought Tolerance ◽

Association Mapping ◽

Black Tea ◽

Putative Qtls ◽

Tea Quality

Combined linkage and association mapping of putative QTLs controlling black tea quality and drought tolerance traits

10.1101/458596 ◽

2018 ◽

Author(s):

Robert. K. Koech ◽

Richard Mose ◽

Samson M. Kamunya ◽

Zeno Apostolides

Keyword(s):

Association Mapping ◽

Carbon Fixation ◽

Black Tea ◽

Interval Mapping ◽

Phenotypic Traits ◽

Phenotypic Variance ◽

Quality Traits ◽

Putative Qtls ◽

Tea Quality ◽

Population Structure Analysis

AbstractThe advancements in genotyping have opened new approaches for identification and precise mapping of Quantitative Trait Loci (QTLs) in plants, particularly by combining linkage and association mapping (AM) analysis. In this study, a combination of linkage and the AM approach was used to identify and authenticate putative QTLs associated with black tea quality traits and percent relative water content (%RWC). The population structure analysis clustered two parents and their respective 261 F1 progenies from the two reciprocal crosses into two clusters with 141 tea accessions in cluster one and 122 tea accessions in cluster two. The two clusters were of mixed origin with tea accessions in population TRFK St. 504 clustering together with tea accessions in population TRFK St. 524. A total of 71 putative QTLs linked to black tea quality traits and %RWC were detected in interval mapping (IM) method and were used as cofactors in multiple QTL model (MQM) mapping where 46 putative QTLs were detected. The phenotypic variance for each QTL ranged from 2.8–23.3% in IM and 4.1–23% in MQM mapping. Using Q-model and Q+K-model in AM, a total of 49 DArTseq markers were associated with 16 phenotypic traits. Significant marker-trait association in AM were similar to those obtained in IM, and MQM mapping except for six more putative QTLs detected in AM which are involved in biosynthesis of secondary metabolites, carbon fixation and abiotic stress. The combined linkage and AM approach appears to have great potential to improve the selection of desirable traits in tea breeding.

A review of deep learning applications for genomic selection

BMC Genomics ◽

10.1186/s12864-020-07319-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Osval Antonio Montesinos-López ◽

Abelardo Montesinos-López ◽

Paulino Pérez-Rodríguez ◽

José Alberto Barrón-López ◽

Johannes W. R. Martini ◽

...

Keyword(s):

Deep Learning ◽

Plant Breeding ◽

Genomic Selection ◽

Genomic Prediction ◽

Mixed Model ◽

Prediction Models ◽

Genetic Effect ◽

Training Data ◽

Additive Genetic Effect ◽

Main Body

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

10.1101/2021.12.16.472985 ◽

2021 ◽

Author(s):

Lance F Merrick ◽

Dennis N Lozada ◽

Xianming Chen ◽

Arron H Carter

Keyword(s):

Support Vector Machine ◽

Winter Wheat ◽

Genomic Selection ◽

Stripe Rust ◽

Regression Models ◽

Prediction Models ◽

Support Vector ◽

Classification Models ◽

Breeding Lines ◽

Classification And Regression

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in four years (2016-2018, and 2020) and a diversity panel phenotyped in four years (2013-2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using rrBLUP and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Further, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.

Forecasting COVID-19 Dynamics and Endpoint in Bangladesh: A Data-driven Approach

10.1101/2020.06.26.20140905 ◽

2020 ◽

Author(s):

Al-Ekram Elahee Hridoy ◽

Mohammad Naim ◽

Nazim Uddin Emon ◽

Imrul Hasan Tipo ◽

Safayet Alam ◽

...

Keyword(s):

Social Life ◽

Inflection Point ◽

Prediction Models ◽

Short Term Memory ◽

Data Driven ◽

World Health ◽

Estimation Methods ◽

Atypical Pneumonia ◽

List Type ◽

Logistic Curve

AbstractOn December 31, 2019, the World Health Organization (WHO) was informed that atypical pneumonia-like cases have emerged in Wuhan City, Hubei province, China. WHO identified it as a novel coronavirus and declared a global pandemic on March 11th, 2020. At the time of writing this, the COVID-19 claimed more than 440 thousand lives worldwide and led to the global economy and social life into an abyss edge in the living memory. As of now, the confirmed cases in Bangladesh have surpassed 100 thousand and more than 1343 deaths putting startling concern on the policymakers and health professionals; thus, prediction models are necessary to forecast a possible number of cases in the future. To shed light on it, in this paper, we presented data-driven estimation methods, the Long Short-Term Memory (LSTM) networks, and Logistic Curve methods to predict the possible number of COVID-19 cases in Bangladesh for the upcoming months. The results using Logistic Curve suggests that Bangladesh has passed the inflection point on around 28-30 May 2020, a plausible end date to be on the 2nd of January 2021 and it is expected that the total number of infected people to be between 187 thousand to 193 thousand with the assumption that stringent policies are in place. The logistic curve also suggested that Bangladesh would reach peak COVID-19 cases at the end of August with more than 185 thousand total confirmed cases, and around 6000 thousand daily new cases may observe. Our findings recommend that the containment strategies should immediately implement to reduce transmission and epidemic rate of COVID-19 in upcoming days.HighlightsAccording to the Logistic curve fitting analysis, the inflection point of the COVID-19 pandemic has recently passed, which was approximately between May 28, 2020, to May 30, 2020.It is estimated that the total number of confirmed cases will be around 187-193 thousand at the end of the epidemic. We expect that the actual number will most likely to in between these two values, under the assumption that the current transmission is stable and improved stringent policies will be in place to contain the spread of COVID-19.The estimated total death toll will be around 3600-4000 at the end of the epidemic.The epidemic of COVID-19 in Bangladesh will be mostly under control by the 2nd of January 2021 if stringent measures are taken immediately.

Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa (Medicago sativa L.)

Cells ◽

10.3390/cells10123372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 3372

Author(s):

Cesar A. Medina ◽

Harpreet Kaur ◽

Ian Ray ◽

Long-Xi Yu

Keyword(s):

Salt Stress ◽

Abiotic Stress ◽

Medicago Sativa ◽

Genomic Selection ◽

Complex Traits ◽

Prediction Accuracy ◽

Breeding Value ◽

Phenotypic Traits ◽

Genome Wide ◽

Medicago Sativa L

Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.

Characterizing the oligogenic architecture of plant growth phenotypes informs genomic selection approaches in a common wheat population

BMC Genomics ◽

10.1186/s12864-021-07574-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Noah DeWitt ◽

Mohammed Guedira ◽

Edwin Lauer ◽

J. Paul Murphy ◽

David Marshall ◽

...

Keyword(s):

Genetic Variation ◽

Plant Growth ◽

Genomic Selection ◽

Plant Height ◽

Prediction Models ◽

Growth Traits ◽

Heading Date ◽

Additive Genetic Variation ◽

Moderate Effect ◽

Over Time

Abstract Background Genetic variation in growth over the course of the season is a major source of grain yield variation in wheat, and for this reason variants controlling heading date and plant height are among the best-characterized in wheat genetics. While the major variants for these traits have been cloned, the importance of these variants in contributing to genetic variation for plant growth over time is not fully understood. Here we develop a biparental population segregating for major variants for both plant height and flowering time to characterize the genetic architecture of the traits and identify additional novel QTL. Results We find that additive genetic variation for both traits is almost entirely associated with major and moderate-effect QTL, including four novel heading date QTL and four novel plant height QTL. FT2 and Vrn-A3 are proposed as candidate genes underlying QTL on chromosomes 3A and 7A, while Rht8 is mapped to chromosome 2D. These mapped QTL also underlie genetic variation in a longitudinal analysis of plant growth over time. The oligogenic architecture of these traits is further demonstrated by the superior trait prediction accuracy of QTL-based prediction models compared to polygenic genomic selection models. Conclusions In a population constructed from two modern wheat cultivars adapted to the southeast U.S., almost all additive genetic variation in plant growth traits is associated with known major variants or novel moderate-effect QTL. Major transgressive segregation was observed in this population despite the similar plant height and heading date characters of the parental lines. This segregation is being driven primarily by a small number of mapped QTL, instead of by many small-effect, undetected QTL. As most breeding populations in the southeast U.S. segregate for known QTL for these traits, genetic variation in plant height and heading date in these populations likely emerges from similar combinations of major and moderate effect QTL. We can make more accurate and cost-effective prediction models by targeted genotyping of key SNPs.

Similarity-based error prediction approach for real-time inflow forecasting

Hydrology Research ◽

10.2166/nh.2013.098 ◽

2013 ◽

Vol 45 (4-5) ◽

pp. 589-602 ◽

Cited By ~ 5

Author(s):

Mahmood Akbari ◽

Abbas Afshar

Keyword(s):

Real Time ◽

Nearest Neighbor ◽

Prediction Models ◽

Error Prediction ◽

K Nearest Neighbor ◽

Forecasting Models ◽

Main Challenge ◽

Inflow Forecasting ◽

Artificial Neural Network Ann ◽

Prediction Approach

Regardless of extensive researches on hydrologic forecasting models, the issue of updating the outputs from forecasting models has remained a main challenge. Most of the existing output updating methods are mainly based on the presence of persistence in the errors. This paper presents an alternative approach to updating the outputs from forecasting models in order to produce more accurate forecast results. The approach uses the concept of the similarity in errors for error prediction. The K nearest neighbor (KNN) algorithm is employed as a similarity-based error prediction model and improvements are made by new data, and two other forms of the KNN are developed in this study. The KNN models are applied for the error prediction of flow forecasting models in two catchments and the updated flows are compared to those of persistence-based methods such as autoregressive (AR) and artificial neural network (ANN) models. The results show that the similarity-based error prediction models can be recognized as an efficient alternative for real-time inflow forecasting, especially where the persistence in the error series of flow forecasting model is relatively low.