scholarly journals Prediction of the importance of auxiliary traits using computational intelligence and machine learning: A simulation study

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0257213
Author(s):  
Antônio Carlos da Silva Júnior ◽  
Michele Jorge da Silva ◽  
Cosme Damião Cruz ◽  
Isabela de Castro Sant’Anna ◽  
Gabi Nunes Silva ◽  
...  

The present study evaluated the importance of auxiliary traits of a principal trait based on phenotypic information and previously known genetic structure using computational intelligence and machine learning to develop predictive tools for plant breeding. Data of an F2 population represented by 500 individuals, obtained from a cross between contrasting homozygous parents, were simulated. Phenotypic traits were simulated based on previously established means and heritability estimates (30%, 50%, and 80%); traits were distributed in a genome with 10 linkage groups, considering two alleles per marker. Four different scenarios were considered. For the principal trait, heritability was 50%, and 40 control loci were distributed in five linkage groups. Another phenotypic control trait with the same complexity as the principal trait but without any genetic relationship with it and without pleiotropy or a factorial link between the control loci for both traits was simulated. These traits shared a large number of control loci with the principal trait, but could be distinguished by the differential action of the environment on them, as reflected in heritability estimates (30%, 50%, and 80%). The coefficient of determination were considered to evaluate the proposed methodologies. Multiple regression, computational intelligence, and machine learning were used to predict the importance of the tested traits. Computational intelligence and machine learning were superior in extracting nonlinear information from model inputs and quantifying the relative contributions of phenotypic traits. The R2 values ranged from 44.0% - 83.0% and 79.0% - 94.0%, for computational intelligence and machine learning, respectively. In conclusion, the relative contributions of auxiliary traits in different scenarios in plant breeding programs can be efficiently predicted using computational intelligence and machine learning.

1997 ◽  
Vol 122 (3) ◽  
pp. 338-343 ◽  
Author(s):  
Kimberly J. Walters ◽  
George L. Hosfield ◽  
Mark A. Uebersax ◽  
James D. Kelly

Three populations of navy bean (Phaseolus vulgaris L.), consisting of recombinant inbred lines, were grown at two locations for 2 years and were used to study canning quality. The traits measured included visual appeal (VIS), texture (TXT), and washed drained mass (WDM). Genotype mean squares were significant for all three traits across populations, although location and year mean squares were higher. We found a positive correlation (r = 0.19 to 0.66) between VIS and TXT and a negative correlation (r = -0.26 to -0.66) between VIS and WDM and between TXT and WDM (r = -0.53 to -0.83) in all three populations. Heritability estimates were calculated for VIS, TXT, and WDM, and these values were moderate to high (0.48 to 0.78). Random amplified polymorphic DNA markers associated with quantitative trait loci (QTL) for the same canning quality traits were identified and studied in each population. Marker-QTL associations were established using the general linear models procedure with significance set at P=0.05. Location and population specificity was common among the marker-QTL associations identified. Coefficient of determination (R2) values for groups of markers used in multiple regression analyses ranged from 0.2 to 0.52 for VIS, 0.11 to 0.38 for TXT, and 0.25 to 0.38 for WDM. Markers were identified that were associated with multiple traits and those associations supported correlations between phenotypic traits. MAS would offer no advantage over phenotypic selection for the improvement of negatively associated traits.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Şakir Burak Bükücü ◽  
Mehmet Sütyemez ◽  
Sina Kefayati ◽  
Aibibula Paizila ◽  
Abdulqader Jighly ◽  
...  

Abstract Breeding studies in walnut (Juglans regia L.) are usually time consuming due to the long juvenile period and therefore, this study aimed to determine markers associated with time of leaf budburst and flowering-related traits by performing a genome-wide association study (GWAS). We investigated genotypic variation and its association with time of leaf budburst and flowering-related traits in 188 walnut accessions. Phenotypic data was obtained from 13 different traits during 3 consecutive years. We used DArT-seq for genotyping with a total of 33,519 (14,761 SNP and 18,758 DArT) markers for genome-wide associations to identify marker underlying these traits. Significant correlations were determined among the 13 different traits. Linkage disequilibrium decayed very quickly in walnut in comparison with other plants. Sixteen quantitative trait loci (QTL) with major effects (R2 between 0.08 and 0.23) were found to be associated with a minimum of two phenotypic traits each. Of these QTL, QTL05 had the maximum number of associated traits (seven). Our study is GWAS for time of leaf budburst and flowering-related traits in Juglans regia L. and has a strong potential to efficiently implement the identified QTL in walnut breeding programs.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kyle A. Parmley ◽  
Race H. Higgins ◽  
Baskar Ganapathysubramanian ◽  
Soumik Sarkar ◽  
Asheesh K. Singh

AbstractWe explored the capability of fusing high dimensional phenotypic trait (phenomic) data with a machine learning (ML) approach to provide plant breeders the tools to do both in-season seed yield (SY) prediction and prescriptive cultivar development for targeted agro-management practices (e.g., row spacing and seeding density). We phenotyped 32 SoyNAM parent genotypes in two independent studies each with contrasting agro-management treatments (two row spacing, three seeding densities). Phenotypic trait data (canopy temperature, chlorophyll content, hyperspectral reflectance, leaf area index, and light interception) were generated using an array of sensors at three growth stages during the growing season and seed yield (SY) determined by machine harvest. Random forest (RF) was used to train models for SY prediction using phenotypic traits (predictor variables) to identify the optimal temporal combination of variables to maximize accuracy and resource allocation. RF models were trained using data from both experiments and individually for each agro-management treatment. We report the most important traits agnostic of agro-management practices. Several predictor variables showed conditional importance dependent on the agro-management system. We assembled predictive models to enable in-season SY prediction, enabling the development of a framework to integrate phenomics information with powerful ML for prediction enabled prescriptive plant breeding.


2021 ◽  
Vol 34 (2) ◽  
pp. 471-478
Author(s):  
RAFAEL DA COSTA ALMEIDA ◽  
WILSON VITORINO DE ASSUNÇÃO NETO ◽  
VERÔNICA BRITO DA SILVA ◽  
LEONARDO CASTELO BRANCO CARVALHO ◽  
ÂNGELA CELIS DE ALMEIDA LOPES ◽  
...  

ABSTRACT Morpho-agronomic characterization studies aiming at the discrimination and classification of lima bean accessions in relation to the centers of domestication and biological status have been of great importance for conserving the biodiversity of this species. For this purpose, researchers have widely used the multivariate analysis called discriminant analysis, which is not always capable of producing satisfactory results. Computational intelligence-based classifiers are additional tools for understanding complex classification problems. In this study, the objective was to test the use of the decision tree in the classification of lima bean according to the centers of domestication and biological status (cultivated and wild), based on eight phenotypic traits of the seed. Sixty accessions of lima bean from the Phaseolus Germplasm Bank of Universidade Federal do Piauí (BGP / UFPI) were evaluated, and classification was performed using two approaches: conventional statistics with discriminant analysis of principal components (DAPC) and computational intelligence through decision tree (DT). The results showed that the use of DT was efficient to identify patterns in the classification of lima bean accessions, due to its comprehensibility. Seed weight was one of the main descriptors used to explain the origin and diversity of the species. The results found will be useful for studies that involve the conservation of genetic resources, mainly for the maintenance of germplasm banks and in breeding programs. In addition, it is recommended to integrate machine learning algorithms in studies aimed at classifying lima bean.


2020 ◽  
Author(s):  
Anurag Sohane ◽  
Ravinder Agarwal

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.


tppj ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Jenna Hershberger ◽  
Nicolas Morales ◽  
Christiano C. Simoes ◽  
Bryan Ellerbrock ◽  
Guillaume Bauchet ◽  
...  

2021 ◽  
Vol 13 (6) ◽  
pp. 1147
Author(s):  
Xiangqian Li ◽  
Wenping Yuan ◽  
Wenjie Dong

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4655
Author(s):  
Dariusz Czerwinski ◽  
Jakub Gęca ◽  
Krzysztof Kolano

In this article, the authors propose two models for BLDC motor winding temperature estimation using machine learning methods. For the purposes of the research, measurements were made for over 160 h of motor operation, and then, they were preprocessed. The algorithms of linear regression, ElasticNet, stochastic gradient descent regressor, support vector machines, decision trees, and AdaBoost were used for predictive modeling. The ability of the models to generalize was achieved by hyperparameter tuning with the use of cross-validation. The conducted research led to promising results of the winding temperature estimation accuracy. In the case of sensorless temperature prediction (model 1), the mean absolute percentage error MAPE was below 4.5% and the coefficient of determination R2 was above 0.909. In addition, the extension of the model with the temperature measurement on the casing (model 2) allowed reducing the error value to about 1% and increasing R2 to 0.990. The results obtained for the first proposed model show that the overheating protection of the motor can be ensured without direct temperature measurement. In addition, the introduction of a simple casing temperature measurement system allows for an estimation with accuracy suitable for compensating the motor output torque changes related to temperature.


Animals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 192
Author(s):  
Xinghai Duan ◽  
Bingxing An ◽  
Lili Du ◽  
Tianpeng Chang ◽  
Mang Liang ◽  
...  

The objective of the present study was to perform a genome-wide association study (GWAS) for growth curve parameters using nonlinear models that fit original weight–age records. In this study, data from 808 Chinese Simmental beef cattle that were weighed at 0, 6, 12, and 18 months of age were used to fit the growth curve. The Gompertz model showed the highest coefficient of determination (R2 = 0.954). The parameters’ mature body weight (A), time-scale parameter (b), and maturity rate (K) were treated as phenotypes for single-trait GWAS and multi-trait GWAS. In total, 9, 49, and 7 significant SNPs associated with A, b, and K were identified by single-trait GWAS; 22 significant single nucleotide polymorphisms (SNPs) were identified by multi-trait GWAS. Among them, we observed several candidate genes, including PLIN3, KCNS3, TMCO1, PRKAG3, ANGPTL2, IGF-1, SHISA9, and STK3, which were previously reported to associate with growth and development. Further research for these candidate genes may be useful for exploring the full genetic architecture underlying growth and development traits in livestock.


Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 599
Author(s):  
Miguel A. Gutierrez-Reinoso ◽  
Pedro M. Aponte ◽  
Manuel Garcia-Herreros

Genomics comprises a set of current and valuable technologies implemented as selection tools in dairy cattle commercial breeding programs. The intensive progeny testing for production and reproductive traits based on genomic breeding values (GEBVs) has been crucial to increasing dairy cattle productivity. The knowledge of key genes and haplotypes, including their regulation mechanisms, as markers for productivity traits, may improve the strategies on the present and future for dairy cattle selection. Genome-wide association studies (GWAS) such as quantitative trait loci (QTL), single nucleotide polymorphisms (SNPs), or single-step genomic best linear unbiased prediction (ssGBLUP) methods have already been included in global dairy programs for the estimation of marker-assisted selection-derived effects. The increase in genetic progress based on genomic predicting accuracy has also contributed to the understanding of genetic effects in dairy cattle offspring. However, the crossing within inbred-lines critically increased homozygosis with accumulated negative effects of inbreeding like a decline in reproductive performance. Thus, inaccurate-biased estimations based on empirical-conventional models of dairy production systems face an increased risk of providing suboptimal results derived from errors in the selection of candidates of high genetic merit-based just on low-heritability phenotypic traits. This extends the generation intervals and increases costs due to the significant reduction of genetic gains. The remarkable progress of genomic prediction increases the accurate selection of superior candidates. The scope of the present review is to summarize and discuss the advances and challenges of genomic tools for dairy cattle selection for optimizing breeding programs and controlling negative inbreeding depression effects on productivity and consequently, achieving economic-effective advances in food production efficiency. Particular attention is given to the potential genomic selection-derived results to facilitate precision management on modern dairy farms, including an overview of novel genome editing methodologies as perspectives toward the future.


Sign in / Sign up

Export Citation Format

Share Document