Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Maize is an extremely important grain crop, and the demand has increased sharply throughout the world. China contributes nearly one-fifth of the total production alone with its decreasing arable land. Timely and accurate prediction of maize yield in China is critical for ensuring global food security. Previous studies primarily used either visible or near-infrared (NIR) based vegetation indices (VIs), or climate data, or both to predict crop yield. However, other satellite data from different spectral bands have been underutilized, which contain unique information on crop growth and yield. In addition, although a joint application of multi-source data significantly improves crop yield prediction, the combinations of input variables that could achieve the best results have not been well investigated. Here we integrated optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield across four agro-ecological zones (AEZs) in China using a regression-based method (LASSO), two machine learning (ML) methods (RF and XGBoost), and deep learning (DL) network (LSTM). The results showed that combining multi-source data explained more than 75% of yield variation. Satellite data at the silking stage contributed more information than other variables, and solar-induced chlorophyll fluorescence (SIF) had an almost equivalent performance with the enhanced vegetation index (EVI) largely due to the low signal to noise ratio and coarse spatial resolution. The extremely high temperature and vapor pressure deficit during the reproductive period were the most important climate variables affecting maize production in China. Soil properties and management factors contained extra information on crop growth conditions that cannot be fully captured by satellite and climate data. We found that ML and DL approaches definitely outperformed regression-based methods, and ML had more computational efficiency and easier generalizations relative to DL. Our study is an important effort to combine multi-source remote sensed and environmental data for large-scale yield prediction. The proposed methodology provides a paradigm for other crop yield predictions and in other regions.

Download Full-text

Predicting Maize Yield at the Plot Scale of Different Fertilizer Systems by Multi-Source Data and Machine Learning Methods

Remote Sensing ◽

10.3390/rs13183760 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3760

Author(s):

Linghua Meng ◽

Huanjun Liu ◽

Susan L. Ustin ◽

Xinle Zhang

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Satellite Data ◽

Maize Yield ◽

Environmental Data ◽

Yield Prediction ◽

Climate Data ◽

Multiple Sources ◽

Adaptive Boosting ◽

Source Data

Timely and reliable maize yield prediction is essential for the agricultural supply chain and food security. Previous studies using either climate or satellite data or both to build empirical or statistical models have prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, fertilizer information may also improve crop yield prediction, especially in regions with different fertilizer systems, such as cover crop, mineral fertilizer, or compost. Machine learning (ML) has been widely and successfully applied in crop yield prediction. Here, we attempted to predict maize yield from 1994 to 2007 at the plot scale by integrating multi-source data, including monthly climate data, satellite data (i.e., vegetation indices (VIs)), fertilizer data, and soil data to explore the accuracy of different inputs to yield prediction. The results show that incorporating all of the datasets using random forests (RF) and AB (adaptive boosting) can achieve better performances in yield prediction (R2: 0.85~0.98). In addition, the combination of VIs, climate data, and soil data (VCS) can predict maize yield more effectively than other combinations (e.g., combinations of all data and combinations of VIs and soil data). Furthermore, we also found that including different fertilizer systems had different prediction accuracies. This paper aggregates data from multiple sources and distinguishes the effects of different fertilization scenarios on crop yield predictions. In addition, the effects of different data on crop yield were analyzed in this study. Our study provides a paradigm that can be used to improve yield predictions for other crops and is an important effort that combines multi-source remotely sensed and environmental data for maize yield prediction at the plot scale and develops timely and robust methods for maize yield prediction grown under different fertilizing systems.

Download Full-text

Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China

Remote Sensing ◽

10.3390/rs12020236 ◽

2020 ◽

Vol 12 (2) ◽

pp. 236 ◽

Cited By ~ 6

Author(s):

Jichong Han ◽

Zhao Zhang ◽

Juan Cao ◽

Yuchuan Luo ◽

Liangliang Zhang ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Winter Wheat ◽

Crop Yield ◽

Wheat Yield ◽

Growth Period ◽

Yield Prediction ◽

Climate Data ◽

Source Data ◽

Winter Wheat Yield

Wheat is one of the main crops in China, and crop yield prediction is important for regional trade and national food security. There are increasing concerns with respect to how to integrate multi-source data and employ machine learning techniques to establish a simple, timely, and accurate crop yield prediction model at an administrative unit. Many previous studies were mainly focused on the whole crop growth period through expensive manual surveys, remote sensing, or climate data. However, the effect of selecting different time window on yield prediction was still unknown. Thus, we separated the whole growth period into four time windows and assessed their corresponding predictive ability by taking the major winter wheat production regions of China as an example in the study. Firstly we developed a modeling framework to integrate climate data, remote sensing data and soil data to predict winter wheat yield based on the Google Earth Engine (GEE) platform. The results show that the models can accurately predict yield 1~2 months before the harvesting dates at the county level in China with an R2 > 0.75 and yield error less than 10%. Support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF) represent the top three best methods for predicting yields among the eight typical machine learning models tested in this study. In addition, we also found that different agricultural zones and temporal training settings affect prediction accuracy. The three models perform better as more winter wheat growing season information becomes available. Our findings highlight a potentially powerful tool to predict yield using multiple-source data and machine learning in other regions and for crops.

Download Full-text

Prediction of Maize Yield at the City Level in China Using Multi-Source Data

Remote Sensing ◽

10.3390/rs13010146 ◽

2021 ◽

Vol 13 (1) ◽

pp. 146

Author(s):

Xinxin Chen ◽

Lan Feng ◽

Rui Yao ◽

Xiaojun Wu ◽

Jia Sun ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Vegetation Index ◽

Maize Yield ◽

Machine Learning Algorithms ◽

Shortwave Radiation ◽

Growth Stages ◽

Climate Data ◽

Early Peak ◽

The City

Maize is a widely grown crop in China, and the relationships between agroclimatic parameters and maize yield are complicated, hence, accurate and timely yield prediction is challenging. Here, climate, satellite data, and meteorological indices were integrated to predict maize yield at the city-level in China from 2000 to 2015 using four machine learning approaches, e.g., cubist, random forest (RF), extreme gradient boosting (Xgboost), and support vector machine (SVM). The climate variables included the diffuse flux of photosynthetic active radiation (PDf), the diffuse flux of shortwave radiation (SDf), the direct flux of shortwave radiation (SDr), minimum temperature (Tmn), potential evapotranspiration (Pet), vapor pressure deficit (Vpd), vapor pressure (Vap), and wet day frequency (Wet). Satellite data, including the enhanced vegetation index (EVI), normalized difference vegetation index (NDVI), and adjusted vegetation index (SAVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS), were used. Meteorological indices, including growing degree day (GDD), extreme degree day (EDD), and the Standardized Precipitation Evapotranspiration Index (SPEI), were used. The results showed that integrating all climate, satellite data, and meteorological indices could achieve the highest accuracy. The highest estimated correlation coefficient (R) values for the cubist, RF, SVM, and Xgboost methods were 0.828, 0.806, 0.742, and 0.758, respectively. The climate, satellite data, or meteorological indices inputs from all growth stages were essential for maize yield prediction, especially in late growth stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late-period to satellite data and meteorological indices from all stages via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak, and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak, and late stages to climate and satellite data from all stages, respectively. The analysis found that the spatial divergences were large and the R value in Northwest region reached 0.942, 0.904, 0.934, and 0.850 for the Cubist, RF, SVM, and Xgboost, respectively. This study highlights the advantages of using climate, satellite data, and meteorological indices for large-scale maize yield estimation with machine learning algorithms.

Download Full-text

Identifying the Contributions of Multi-Source Data for Winter Wheat Yield Prediction in China

Remote Sensing ◽

10.3390/rs12050750 ◽

2020 ◽

Vol 12 (5) ◽

pp. 750 ◽

Cited By ~ 3

Author(s):

Juan Cao ◽

Zhao Zhang ◽

Fulu Tao ◽

Liangliang Zhang ◽

Yuchuan Luo ◽

...

Keyword(s):

Winter Wheat ◽

Satellite Data ◽

Large Scale ◽

Wheat Yield ◽

Yield Variability ◽

Yield Prediction ◽

Climate Data ◽

Light Gradient ◽

Source Data ◽

Winter Wheat Yield

Wheat is a leading cereal grain throughout the world. Timely and reliable wheat yield prediction at a large scale is essential for the agricultural supply chain and global food security, especially in China as an important wheat producing and consuming country. The conventional approach using either climate or satellite data or both to build empirical and crop models has prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, socio-economic (SC) factors may also improve crop yield prediction, but their contributions need in-depth investigation, especially in regions with good irrigation conditions, sufficient fertilization, and pesticide application. Here, we performed the first attempt to predict wheat yield across China from 2001 to 2015 at the county-level by integrating multi-source data, including monthly climate data, satellite data (i.e., Vegetation indices (VIs)), and SC factors. The results show that incorporating all the datasets by using three machine learning methods (Ridge Regression (RR), Random Forest (RF), and Light Gradient Boosting (LightGBM)) can achieve the best performance in yield prediction (R2: 0.68~0.75), with the most individual contributions from climate (~0.53), followed by VIs (~0.45), and SC factors (~0.30). In addition, the combinations of VIs and climate data can capture inter-annual yield variability more effectively than other combinations (e.g., combinations of climate and SC, and combinations of VIs and SC), while combining SC with climate data can better capture spatial yield variability than others. Climate data can provide extra and unique information across the entire growing season, while the peak stage of VIs (Mar.~Apr.) do so. Furthermore, incorporating spatial information and soil proprieties into the benchmark models can improve wheat yield prediction by 0.06 and 0.12, respectively. The optimal wheat prediction can be achieved with approximately a two-month leading time before maturity. Our study develops timely and robust methods for winter wheat yield prediction at a large scale in China, which can be applied to other crops and regions.

Download Full-text

Optimal county-level crop yield prediction using MODIS-based variables and weather data: A comparative study on machine learning models

Agricultural and Forest Meteorology ◽

10.1016/j.agrformet.2021.108530 ◽

2021 ◽

Vol 307 ◽

pp. 108530

Author(s):

Sungha Ju ◽

Hyoungjoon Lim ◽

Jong Won Ma ◽

Soohyun Kim ◽

Kyungdo Lee ◽

...

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Crop Yield ◽

Weather Data ◽

Yield Prediction ◽

County Level ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Analogy-Based Crop Yield Forecasts Based on Temporal Similarity of Leaf Area Index

Remote Sensing ◽

10.3390/rs13163069 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3069

Author(s):

Yadong Liu ◽

Junhwan Kim ◽

David H. Fleisher ◽

Kwang Soo Kim

Keyword(s):

Time Series ◽

Leaf Area Index ◽

Leaf Area ◽

Crop Yield ◽

Satellite Data ◽

Growing Season ◽

Environmental Data ◽

Area Index ◽

Current Season ◽

Wide Range

Seasonal forecasts of crop yield are important components for agricultural policy decisions and farmer planning. A wide range of input data are often needed to forecast crop yield in a region where sophisticated approaches such as machine learning and process-based models are used. This requires considerable effort for data preparation in addition to identifying data sources. Here, we propose a simpler approach called the Analogy Based Crop-yield (ABC) forecast scheme to make timely and accurate prediction of regional crop yield using a minimum set of inputs. In the ABC method, a growing season from a prior long-term period, e.g., 10 years, is first identified as analogous to the current season by the use of a similarity index based on the time series leaf area index (LAI) patterns. Crop yield in the given growing season is then forecasted using the weighted yield average reported in the analogous seasons for the area of interest. The ABC approach was used to predict corn and soybean yields in the Midwestern U.S. at the county level for the period of 2017–2019. The MOD15A2H, which is a satellite data product for LAI, was used to compile inputs. The mean absolute percentage error (MAPE) of crop yield forecasts was <10% for corn and soybean in each growing season when the time series of LAI from the day of year 89 to 209 was used as inputs to the ABC approach. The prediction error for the ABC approach was comparable to results from a deep neural network model that relied on soil and weather data as well as satellite data in a previous study. These results indicate that the ABC approach allowed for crop yield forecast with a lead-time of at least two months before harvest. In particular, the ABC scheme would be useful for regions where crop yield forecasts are limited by availability of reliable environmental data.

Download Full-text

Analysis of agricultural crop yield prediction using statistical techniques of machine learning

Materials Today Proceedings ◽

10.1016/j.matpr.2021.01.948 ◽

2021 ◽

Author(s):

Janmejay Pant ◽

R.P. Pant ◽

Manoj Kumar Singh ◽

Devesh Pratap Singh ◽

Himanshu Pant

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Statistical Techniques ◽

Yield Prediction ◽

Agricultural Crop

Download Full-text

Crop Yield Prediction Using Machine Learning

Adalya Journal ◽

10.37896/aj9.4/012 ◽

2020 ◽

Vol 9 (4) ◽

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Yield Prediction

Download Full-text

An interaction regression model for crop yield prediction

Scientific Reports ◽

10.1038/s41598-021-97221-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Javad Ansarifar ◽

Lizhi Wang ◽

Sotirios V. Archontoulis

Keyword(s):

Machine Learning ◽

Regression Model ◽

Crop Yield ◽

Machine Learning Algorithms ◽

Training Data ◽

Yield Prediction ◽

Soybean Yield ◽

Global Food Security ◽

Management Scenario ◽

Complex Interactions

AbstractCrop yield prediction is crucial for global food security yet notoriously challenging due to multitudinous factors that jointly determine the yield, including genotype, environment, management, and their complex interactions. Integrating the power of optimization, machine learning, and agronomic insight, we present a new predictive model (referred to as the interaction regression model) for crop yield prediction, which has three salient properties. First, it achieved a relative root mean square error of 8% or less in three Midwest states (Illinois, Indiana, and Iowa) in the US for both corn and soybean yield prediction, outperforming state-of-the-art machine learning algorithms. Second, it identified about a dozen environment by management interactions for corn and soybean yield, some of which are consistent with conventional agronomic knowledge whereas some others interactions require additional analysis or experiment to prove or disprove. Third, it quantitatively dissected crop yield into contributions from weather, soil, management, and their interactions, allowing agronomists to pinpoint the factors that favorably or unfavorably affect the yield of a given location under a given weather and management scenario. The most significant contribution of the new prediction model is its capability to produce accurate prediction and explainable insights simultaneously. This was achieved by training the algorithm to select features and interactions that are spatially and temporally robust to balance prediction accuracy for the training data and generalizability to the test data.

Download Full-text