Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China

Wheat is one of the main crops in China, and crop yield prediction is important for regional trade and national food security. There are increasing concerns with respect to how to integrate multi-source data and employ machine learning techniques to establish a simple, timely, and accurate crop yield prediction model at an administrative unit. Many previous studies were mainly focused on the whole crop growth period through expensive manual surveys, remote sensing, or climate data. However, the effect of selecting different time window on yield prediction was still unknown. Thus, we separated the whole growth period into four time windows and assessed their corresponding predictive ability by taking the major winter wheat production regions of China as an example in the study. Firstly we developed a modeling framework to integrate climate data, remote sensing data and soil data to predict winter wheat yield based on the Google Earth Engine (GEE) platform. The results show that the models can accurately predict yield 1~2 months before the harvesting dates at the county level in China with an R2 > 0.75 and yield error less than 10%. Support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF) represent the top three best methods for predicting yields among the eight typical machine learning models tested in this study. In addition, we also found that different agricultural zones and temporal training settings affect prediction accuracy. The three models perform better as more winter wheat growing season information becomes available. Our findings highlight a potentially powerful tool to predict yield using multiple-source data and machine learning in other regions and for crops.

Download Full-text

Winter Wheat Yield Prediction Using Convolutional Neural Networks from Environmental and Phenological Data

10.21203/rs.3.rs-789462/v1 ◽

2021 ◽

Author(s):

Amit Kumar Srivast ◽

Nima Safaei ◽

Saeed Khaki ◽

Gina Lopez ◽

Wenzhi Zeng ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Winter Wheat ◽

Crop Yield ◽

Wheat Yield ◽

Yield Prediction ◽

Learning Models ◽

Winter Wheat Yield ◽

Machine Learning Models

Abstract Crop yield forecasting depends on many interactive factors including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using extensive datasets of weather, soil, and crop phenology. We propose a convolutional neural network (CNN) which uses the 1-dimentional convolution operation to capture the time dependencies of environmental variables. The proposed CNN, evaluated along with other machine learning models for winter wheat yield prediction in Germany, outperformed all other models tested. To address the seasonality, weekly features were used that explicitly take soil moisture and meteorological events into account. Our results indicated that nonlinear models such as deep learning models and XGboost are more effective in finding the functional relationship between the crop yield and input data compared to linear models and deep neural networks had a higher prediction accuracy than XGboost. One of the main limitations of machine learning models is their black box property. Therefore, we moved beyond prediction and performed feature selection, as it provides key results towards explaining yield prediction (variable importance by time). As such, our study indicates which variables have the most significant effect on winter wheat yield.

Download Full-text

Identifying the Contributions of Multi-Source Data for Winter Wheat Yield Prediction in China

Remote Sensing ◽

10.3390/rs12050750 ◽

2020 ◽

Vol 12 (5) ◽

pp. 750 ◽

Cited By ~ 3

Author(s):

Juan Cao ◽

Zhao Zhang ◽

Fulu Tao ◽

Liangliang Zhang ◽

Yuchuan Luo ◽

...

Keyword(s):

Winter Wheat ◽

Satellite Data ◽

Large Scale ◽

Wheat Yield ◽

Yield Variability ◽

Yield Prediction ◽

Climate Data ◽

Light Gradient ◽

Source Data ◽

Winter Wheat Yield

Wheat is a leading cereal grain throughout the world. Timely and reliable wheat yield prediction at a large scale is essential for the agricultural supply chain and global food security, especially in China as an important wheat producing and consuming country. The conventional approach using either climate or satellite data or both to build empirical and crop models has prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, socio-economic (SC) factors may also improve crop yield prediction, but their contributions need in-depth investigation, especially in regions with good irrigation conditions, sufficient fertilization, and pesticide application. Here, we performed the first attempt to predict wheat yield across China from 2001 to 2015 at the county-level by integrating multi-source data, including monthly climate data, satellite data (i.e., Vegetation indices (VIs)), and SC factors. The results show that incorporating all the datasets by using three machine learning methods (Ridge Regression (RR), Random Forest (RF), and Light Gradient Boosting (LightGBM)) can achieve the best performance in yield prediction (R2: 0.68~0.75), with the most individual contributions from climate (~0.53), followed by VIs (~0.45), and SC factors (~0.30). In addition, the combinations of VIs and climate data can capture inter-annual yield variability more effectively than other combinations (e.g., combinations of climate and SC, and combinations of VIs and SC), while combining SC with climate data can better capture spatial yield variability than others. Climate data can provide extra and unique information across the entire growing season, while the peak stage of VIs (Mar.~Apr.) do so. Furthermore, incorporating spatial information and soil proprieties into the benchmark models can improve wheat yield prediction by 0.06 and 0.12, respectively. The optimal wheat prediction can be achieved with approximately a two-month leading time before maturity. Our study develops timely and robust methods for winter wheat yield prediction at a large scale in China, which can be applied to other crops and regions.

Download Full-text

An Improved CASA Model for Estimating Winter Wheat Yield from Remote Sensing Images

Remote Sensing ◽

10.3390/rs11091088 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1088 ◽

Cited By ~ 4

Author(s):

Yulong Wang ◽

Xingang Xu ◽

Linsheng Huang ◽

Guijun Yang ◽

Lingling Fan ◽

...

Keyword(s):

Remote Sensing ◽

Time Series ◽

Winter Wheat ◽

Crop Yield ◽

Satellite Remote Sensing ◽

Wheat Yield ◽

Growing Season ◽

Remote Sensing Images ◽

Casa Model ◽

Winter Wheat Yield

The accurate and timely monitoring and evaluation of the regional grain crop yield is more significant for formulating import and export plans of agricultural products, regulating grain markets and adjusting the planting structure. In this study, an improved Carnegie–Ames–Stanford approach (CASA) model was coupled with time-series satellite remote sensing images to estimate winter wheat yield. Firstly, in 2009 the entire growing season of winter wheat in the two districts of Tongzhou and Shunyi of Beijing was divided into 54 stages at five-day intervals. Net Primary Production (NPP) of winter wheat was estimated by the improved CASA model with HJ-1A/B satellite images from 39 transits. For the 15 stages without HJ-1A/B transit, MOD17A2H data products were interpolated to obtain the spatial distribution of winter wheat NPP at 5-day intervals over the entire growing season of winter wheat. Then, an NPP-yield conversion model was utilized to estimate winter wheat yield in the study area. Finally, the accuracy of the method to estimate winter wheat yield with remote sensing images was verified by comparing its results to the ground-measured yield. The results showed that the estimated yield of winter wheat based on remote sensing images is consistent with the ground-measured yield, with R2 of 0.56, RMSE of 1.22 t ha−1, and an average relative error of −6.01%. Based on time-series satellite remote sensing images, the improved CASA model can be used to estimate the NPP and thereby the yield of regional winter wheat. This approach satisfies the accuracy requirements for estimating regional winter wheat yield and thus may be used in actual applications. It also provides a technical reference for estimating large-scale crop yield.

Download Full-text

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Remote Sensing ◽

10.3390/rs12010021 ◽

2019 ◽

Vol 12 (1) ◽

pp. 21 ◽

Cited By ~ 6

Author(s):

Liangliang Zhang ◽

Zhao Zhang ◽

Yuchuan Luo ◽

Juan Cao ◽

Fulu Tao

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Satellite Data ◽

Maize Yield ◽

Environmental Data ◽

Yield Prediction ◽

County Level ◽

Climate Data ◽

Source Data ◽

Optical Fluorescence

Maize is an extremely important grain crop, and the demand has increased sharply throughout the world. China contributes nearly one-fifth of the total production alone with its decreasing arable land. Timely and accurate prediction of maize yield in China is critical for ensuring global food security. Previous studies primarily used either visible or near-infrared (NIR) based vegetation indices (VIs), or climate data, or both to predict crop yield. However, other satellite data from different spectral bands have been underutilized, which contain unique information on crop growth and yield. In addition, although a joint application of multi-source data significantly improves crop yield prediction, the combinations of input variables that could achieve the best results have not been well investigated. Here we integrated optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield across four agro-ecological zones (AEZs) in China using a regression-based method (LASSO), two machine learning (ML) methods (RF and XGBoost), and deep learning (DL) network (LSTM). The results showed that combining multi-source data explained more than 75% of yield variation. Satellite data at the silking stage contributed more information than other variables, and solar-induced chlorophyll fluorescence (SIF) had an almost equivalent performance with the enhanced vegetation index (EVI) largely due to the low signal to noise ratio and coarse spatial resolution. The extremely high temperature and vapor pressure deficit during the reproductive period were the most important climate variables affecting maize production in China. Soil properties and management factors contained extra information on crop growth conditions that cannot be fully captured by satellite and climate data. We found that ML and DL approaches definitely outperformed regression-based methods, and ML had more computational efficiency and easier generalizations relative to DL. Our study is an important effort to combine multi-source remote sensed and environmental data for large-scale yield prediction. The proposed methodology provides a paradigm for other crop yield predictions and in other regions.

Download Full-text

Predicting Maize Yield at the Plot Scale of Different Fertilizer Systems by Multi-Source Data and Machine Learning Methods

Remote Sensing ◽

10.3390/rs13183760 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3760

Author(s):

Linghua Meng ◽

Huanjun Liu ◽

Susan L. Ustin ◽

Xinle Zhang

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Satellite Data ◽

Maize Yield ◽

Environmental Data ◽

Yield Prediction ◽

Climate Data ◽

Multiple Sources ◽

Adaptive Boosting ◽

Source Data

Timely and reliable maize yield prediction is essential for the agricultural supply chain and food security. Previous studies using either climate or satellite data or both to build empirical or statistical models have prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, fertilizer information may also improve crop yield prediction, especially in regions with different fertilizer systems, such as cover crop, mineral fertilizer, or compost. Machine learning (ML) has been widely and successfully applied in crop yield prediction. Here, we attempted to predict maize yield from 1994 to 2007 at the plot scale by integrating multi-source data, including monthly climate data, satellite data (i.e., vegetation indices (VIs)), fertilizer data, and soil data to explore the accuracy of different inputs to yield prediction. The results show that incorporating all of the datasets using random forests (RF) and AB (adaptive boosting) can achieve better performances in yield prediction (R2: 0.85~0.98). In addition, the combination of VIs, climate data, and soil data (VCS) can predict maize yield more effectively than other combinations (e.g., combinations of all data and combinations of VIs and soil data). Furthermore, we also found that including different fertilizer systems had different prediction accuracies. This paper aggregates data from multiple sources and distinguishes the effects of different fertilization scenarios on crop yield predictions. In addition, the effects of different data on crop yield were analyzed in this study. Our study provides a paradigm that can be used to improve yield predictions for other crops and is an important effort that combines multi-source remotely sensed and environmental data for maize yield prediction at the plot scale and develops timely and robust methods for maize yield prediction grown under different fertilizing systems.

Download Full-text

Combining Multi-Source Data and Machine Learning Approaches to Predict Winter Wheat Yield in the Conterminous United States

Remote Sensing ◽

10.3390/rs12081232 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1232 ◽

Cited By ~ 7

Author(s):

Yumiao Wang ◽

Zhou Zhang ◽

Luwei Feng ◽

Qingyun Du ◽

Troy Runge

Keyword(s):

United States ◽

Machine Learning ◽

Winter Wheat ◽

Wheat Yield ◽

Growing Season ◽

The United States ◽

Learning Approaches ◽

Source Data ◽

The World ◽

Winter Wheat Yield

Winter wheat (Triticum aestivum L.) is one of the most important cereal crops, supplying essential food for the world population. Because the United States is a major producer and exporter of wheat to the world market, accurate and timely forecasting of wheat yield in the United States (U.S.) is fundamental to national crop management as well as global food security. Previous studies mainly have focused on developing empirical models using only satellite remote sensing images, while other yield determinants have not yet been adequately explored. In addition, these models are based on traditional statistical regression algorithms, while more advanced machine learning approaches have not been explored. This study used advanced machine learning algorithms to establish within-season yield prediction models for winter wheat using multi-source data to address these issues. Specifically, yield driving factors were extracted from four different data sources, including satellite images, climate data, soil maps, and historical yield records. Subsequently, two linear regression methods, including ordinary least square (OLS) and least absolute shrinkage and selection operator (LASSO), and four well-known machine learning methods, including support vector machine (SVM), random forest (RF), Adaptive Boosting (AdaBoost), and deep neural network (DNN), were applied and compared for estimating the county-level winter wheat yield in the Conterminous United States (CONUS) within the growing season. Our models were trained on data from 2008 to 2016 and evaluated on data from 2017 and 2018, with the results demonstrating that the machine learning approaches performed better than the linear regression models, with the best performance being achieved using the AdaBoost model (R2 = 0.86, RMSE = 0.51 t/ha, MAE = 0.39 t/ha). Additionally, the results showed that combining data from multiple sources outperformed single source satellite data, with the highest accuracy being obtained when the four data sources were all considered in the model development. Finally, the prediction accuracy was also evaluated against timeliness within the growing season, with reliable predictions (R2 > 0.84) being able to be achieved 2.5 months before the harvest when the multi-source data were combined.

Download Full-text

Assimilating SAR and Optical Remote Sensing Data into WOFOST Model for Improving Winter Wheat Yield Estimation

2018 7th International Conference on Agro-geoinformatics (Agro-geoinformatics) ◽

10.1109/agro-geoinformatics.2018.8476074 ◽

2018 ◽

Author(s):

Wen Zhuo ◽

Jianxi Huang ◽

Li Li ◽

Ran Huang ◽

Xinran Gao ◽

...

Keyword(s):

Remote Sensing ◽

Winter Wheat ◽

Wheat Yield ◽

Remote Sensing Data ◽

Optical Remote Sensing ◽

Yield Estimation ◽

Sensing Data ◽

Winter Wheat Yield

Download Full-text

Estimation of winter wheat yield by using remote sensing data and crop model

10.1117/12.930407 ◽

2012 ◽

Author(s):

Jianmao Guo ◽

Tengfei Zheng ◽

Qi Wang ◽

Jia Yang ◽

Junyi Shi ◽

...

Keyword(s):

Remote Sensing ◽

Winter Wheat ◽

Wheat Yield ◽

Remote Sensing Data ◽

Crop Model ◽

Sensing Data ◽

Winter Wheat Yield

Download Full-text

Winter Wheat Yield Estimation Coupling Weight Optimization Combination Method with Remote Sensing Data from Landsat5 TM

Computer and Computing Technologies in Agriculture V - IFIP Advances in Information and Communication Technology ◽

10.1007/978-3-642-27275-2_32 ◽

2012 ◽

pp. 284-292

Author(s):

Xingang Xu ◽

Jihua Wang ◽

Wenjiang Huang ◽

Cunjun Li ◽

Xiaoyu Song ◽

...

Keyword(s):

Remote Sensing ◽

Winter Wheat ◽

Wheat Yield ◽

Remote Sensing Data ◽

Combination Method ◽

Weight Optimization ◽

Yield Estimation ◽

Sensing Data ◽

Winter Wheat Yield

Download Full-text

Operating at the extreme: estimating the upper yield boundary of winter wheat production in commercial practice

Royal Society Open Science ◽

10.1098/rsos.191919 ◽

2020 ◽

Vol 7 (4) ◽

pp. 191919

Author(s):

Emily G. Mitchell ◽

Neil M. J. Crout ◽

Paul Wilson ◽

Andrew T. A. Wood ◽

Gilles Stupfler

Keyword(s):

Winter Wheat ◽

Crop Yield ◽

Agronomic Traits ◽

Maximum Yield ◽

Crop Protection ◽

Wheat Yield ◽

Statistical Evidence ◽

Extreme Value Analysis ◽

Value Analysis ◽

Winter Wheat Yield

Wheat farming provides 28.5% of global cereal production. After steady growth in average crop yield from 1950 to 1990, wheat yields have generally stagnated, which prompts the question of whether further improvements are possible. Statistical studies of agronomic parameters such as crop yield have so far exclusively focused on estimating parameters describing the whole of the data, rather than the highest yields specifically. These indicators include the mean or median yield of a crop, or finding the combinations of agronomic traits that are correlated with increasing average yields. In this paper, we take an alternative approach and consider high yields only. We carry out an extreme value analysis of winter wheat yield data collected in England and Wales between 2006 and 2015. This analysis suggests that, under current climate and growing conditions, there is indeed a finite upper bound for winter wheat yield, whose value we estimate to be 17.60 tonnes per hectare. We then refine the analysis for strata defined by either location or level of use of agricultural inputs. We find that there is no statistical evidence for variation of maximal yield depending on location, and neither is there statistical evidence that maximum yield levels are improved by high levels of crop protection and fertilizer use.

Download Full-text