scholarly journals Automated Machine Learning: a case study of genomic “image-based” prediction in maize hybrids

Author(s):  
Giovanni Galli ◽  
Felipe Sabadin ◽  
Rafael Massahiro Yassue ◽  
Cassia Galves de Souza ◽  
Humberto Fanelli Carvalho ◽  
...  

Abstract Machine learning methods such as Multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this sense, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as “genomic images”. In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP but improved a little using only the additive genomic layer. It is expected that the average effect of allele substitution is mostly linear. Nevertheless, the methodology’s potential for GP is unprecedented because we can create “multispectral genome images,” including other effects and layers of data, such as dominance, epistasis, g × e, transcriptome, and so on, capturing linear and non-linear effects and boosting prediction accuracies. Hence, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.

2021 ◽  
Author(s):  
Giovanni Galli ◽  
Felipe Sabadin ◽  
Rafael Massahiro Yassue ◽  
Cassia Galves de Souza ◽  
Humberto Fanelli Carvalho ◽  
...  

Abstract Machine learning methods such as Multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this sense, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as “genomic images”. In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP but improved a little using only the additive genomic layer. It is expected that the average effect of allele substitution is mostly linear. Nevertheless, the methodology’s potential for GP is unprecedented because we can create “multispectral genome images,” including other effects and layers of data, such as dominance, epistasis, g × e, transcriptome, and so on, capturing linear and non-linear effects and boosting prediction accuracies. Hence, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.


2020 ◽  
Author(s):  
Saufi Karim ◽  
Patrick Lucañas ◽  
Ain Nadrah Sazali ◽  
Nina Marie Hernandez ◽  
Francois Baillard

2020 ◽  
Vol 10 (5) ◽  
pp. 1795 ◽  
Author(s):  
Zhen Xu ◽  
Yuan Wu ◽  
Ming-zhu Qi ◽  
Ming Zheng ◽  
Chen Xiong ◽  
...  

Being the necessary data of the city-scale seismic damage simulations, structural types of buildings of a city need to be collected. To this end, a prediction method of structural types of buildings based on machine learning (ML) is proposed herein. Specifically, using the training data of 230,683 buildings in Tangshan city, China, a supervised ML solution based on a decision forest model was designed for the prediction. The scale sensitivity and regional applicability of the designed solution are discussed, respectively, and the results show that the supervised ML solution can maintain high accuracy for different scales; however, it is only suitable for cities similar to the sample city. For wide applicability for various cities, a semi-supervised ML solution was designed based on sampling investigation and self-training procedures. The downtowns of Daxing and Tongzhou districts in Beijing were selected as a case study for the designed semi-supervised ML solution. The overall prediction accuracies of structural types for Daxing and Tongzhou downtowns can reach 94.8% and 99.5%, respectively, which are acceptable for seismic damage simulations. Based on the predicted results, the distributions of seismic damage in Daxing and Tongzhou downtown were output. This study provides a smart and efficient method for obtaining structural types for a city-scale seismic damage simulation.


Water ◽  
2021 ◽  
Vol 13 (24) ◽  
pp. 3482
Author(s):  
Mikhail Sarafanov ◽  
Yulia Borisova ◽  
Mikhail Maslyaev ◽  
Ilia Revin ◽  
Gleb Maximov ◽  
...  

The paper presents a hybrid approach for short-term river flood forecasting. It is based on multi-modal data fusion from different sources (weather stations, water height sensors, remote sensing data). To improve the forecasting efficiency, the machine learning methods and the Snowmelt-Runoff physical model are combined in a composite modeling pipeline using automated machine learning techniques. The novelty of the study is based on the application of automated machine learning to identify the individual blocks of a composite pipeline without involving an expert. It makes it possible to adapt the approach to various river basins and different types of floods. Lena River basin was used as a case study since its modeling during spring high water is complicated by the high probability of ice-jam flooding events. Experimental comparison with the existing methods confirms that the proposed approach reduces the error at each analyzed level gauging station. The value of Nash–Sutcliffe model efficiency coefficient for the ten stations chosen for comparison is 0.80. The other approaches based on statistical and physical models could not surpass the threshold of 0.74. Validation for a high-water period also confirms that a composite pipeline designed using automated machine learning is much more efficient than stand-alone models.


Sign in / Sign up

Export Citation Format

Share Document