scholarly journals Application of spatio-temporal data in site-specific maize yield prediction with machine learning methods

Author(s):  
A. Nyéki ◽  
C. Kerepesi ◽  
B. Daróczy ◽  
A. Benczúr ◽  
G. Milics ◽  
...  

AbstractIn order to meet the requirements of sustainability and to determine yield drivers and limiting factors, it is now more likely that traditional yield modelling will be carried out using artificial intelligence (AI). The aim of this study was to predict maize yields using AI that uses spatio-temporal training data. The paper has advanced a new method of maize yield prediction, which is based on spatio-temporal data mining. To find the best solution, various models were used: counter-propagation artificial neural networks (CP-ANNs), XY-fused Querynetworks (XY-Fs), supervised Kohonen networks (SKNs), neural networks with Rectangular Linear Activations (ReLU), extreme gradient boosting (XGBoost), support-vector machine (SVM), and different subsets of the independent variables in five vegetation periods. Input variables for modelling included: soil parameters (pH, P2O5, K2O, Zn, clay content, ECa, draught force, Cone index), micro-relief averages, and meteorological parameters for the 63 treatment units in a 15.3 ha research field. The best performing method (XGBoost) reached 92.1% and 95.3% accuracy on the training and the test sets. Additionally, a novel method was introduced to treat individual units in a lattice system. The lattice-based smoothing performed an additional increase in Area under the curve (AUC) to 97.5% over the individual predictions of the XGBoost model. The models were developed using 48 different subsets of variables to determine which variables consistently contributed to prediction accuracy. By comparing the resulting models, it was shown that the best regression model was Extreme Gradient Boosting Trees, with 92.1% accuracy (on the training set). In addition, the method calculates the influence of the spatial distribution of site-specific soil fertility on maize grain yields. This paper provides a new method of spatio-temporal data analyses, taking the most important influencing factors on maize yields into account.

Author(s):  
A. Nyéki ◽  
C. Kerepesi ◽  
B. Daróczy ◽  
A. Benczúr ◽  
G. Milics ◽  
...  

2021 ◽  
pp. 289-301
Author(s):  
B. Martín ◽  
J. González–Arias ◽  
J. A. Vicente–Vírseda

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.


Author(s):  
Ruopeng Xie ◽  
Jiahui Li ◽  
Jiawei Wang ◽  
Wei Dai ◽  
André Leier ◽  
...  

Abstract Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user’s viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.


Author(s):  
Arusey Chebet ◽  
Otinga A. Nekesa ◽  
Wilson Ng’etich ◽  
Ruth Njoroge ◽  
Roland W. Scholz ◽  
...  

The objective of this study was to evaluate the effects of site-specific fertilizer recommendations on maize yield using the transdisciplinary (TD) process. 144 farmers participated in the study for the two seasons. Experiments were laid on the farmers’ fields at four sites (Kapyemit, Kipsomba, Ngenyilel and Ziwa, in Uasin Gishu County) using Randomized Complete Block Design in a 3 x 2 factorial arrangement. Treatments included farmers who participated in the TD process (TD2) and those who did not (TD1) in using the interventions for soil fertility improvement which were farmer own practices (ST1); farmers who applied government recommendations (ST2), and site-specific fertilizer recommendations (ST3) which was based on soil testing results. The Data collected was the dry weights of maize which were measured at the end of the seasons and subjected to Analysis of Variance using Genstat 14th edition. Means separation was done using Fischer’s unprotected Least Significant Difference.. There was a significant effect on maize yields by soil testing and participation in TD process p = 0.01. The mean maize grain yield for season one was 5.43 ton ha-1 while for season two was 5.73 ton ha-1. Control farmers (TD1) maize grain yield of 5.27 ton ha-1, had a significant difference (p = 0.05) from the yield of participating farmers (TD2) who had 5.96 ton ha-1. Maize grain yield was increased by the application of site specific fertilizer recommendations which gave an overall mean of 6.57 ton ha-1 for season one and 6.56 ton ha-1 for season two. Following (ST3) recommendations and participation in the TD process, improved soil nutrient content thus maize yield increased. We recommend soil testing and consequent site-specific fertilizer recommendations for any initiative in managing soil fertility.


Author(s):  
А. Axyonov ◽  
D. Ryumin ◽  
I. Kagirov

Abstract. This paper presents a new method for collecting multimodal sign language (SL) databases, which is distinguished by the use of multimodal video data. The paper also proposes a new method of multimodal sign recognition, which is distinguished by the analysis of spatio-temporal visual features of SL units (i.e. lexemes). Generally, gesture recognition is a processing of a video sequence, which helps to extract information on movements of any articulator (a part of the human body) in time and space. With this approach, the recognition accuracy of isolated signs was 88.92%. The proposed method, due to the extraction and analysis of spatio-temporal data, makes it possible to identify more informative features of signs, which leads to an increase in the accuracy of SL recognition.


Author(s):  
Pavel Kikin ◽  
Alexey Kolesnikov ◽  
Alexey Portnov ◽  
Denis Grischenko

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem


Sign in / Sign up

Export Citation Format

Share Document