Prediction of Maize Yield at the City Level in China Using Multi-Source Data

Maize is a widely grown crop in China, and the relationships between agroclimatic parameters and maize yield are complicated, hence, accurate and timely yield prediction is challenging. Here, climate, satellite data, and meteorological indices were integrated to predict maize yield at the city-level in China from 2000 to 2015 using four machine learning approaches, e.g., cubist, random forest (RF), extreme gradient boosting (Xgboost), and support vector machine (SVM). The climate variables included the diffuse flux of photosynthetic active radiation (PDf), the diffuse flux of shortwave radiation (SDf), the direct flux of shortwave radiation (SDr), minimum temperature (Tmn), potential evapotranspiration (Pet), vapor pressure deficit (Vpd), vapor pressure (Vap), and wet day frequency (Wet). Satellite data, including the enhanced vegetation index (EVI), normalized difference vegetation index (NDVI), and adjusted vegetation index (SAVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS), were used. Meteorological indices, including growing degree day (GDD), extreme degree day (EDD), and the Standardized Precipitation Evapotranspiration Index (SPEI), were used. The results showed that integrating all climate, satellite data, and meteorological indices could achieve the highest accuracy. The highest estimated correlation coefficient (R) values for the cubist, RF, SVM, and Xgboost methods were 0.828, 0.806, 0.742, and 0.758, respectively. The climate, satellite data, or meteorological indices inputs from all growth stages were essential for maize yield prediction, especially in late growth stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late-period to satellite data and meteorological indices from all stages via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak, and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak, and late stages to climate and satellite data from all stages, respectively. The analysis found that the spatial divergences were large and the R value in Northwest region reached 0.942, 0.904, 0.934, and 0.850 for the Cubist, RF, SVM, and Xgboost, respectively. This study highlights the advantages of using climate, satellite data, and meteorological indices for large-scale maize yield estimation with machine learning algorithms.

Download Full-text

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Remote Sensing ◽

10.3390/rs12010021 ◽

2019 ◽

Vol 12 (1) ◽

pp. 21 ◽

Cited By ~ 6

Author(s):

Liangliang Zhang ◽

Zhao Zhang ◽

Yuchuan Luo ◽

Juan Cao ◽

Fulu Tao

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Satellite Data ◽

Maize Yield ◽

Environmental Data ◽

Yield Prediction ◽

County Level ◽

Climate Data ◽

Source Data ◽

Optical Fluorescence

Maize is an extremely important grain crop, and the demand has increased sharply throughout the world. China contributes nearly one-fifth of the total production alone with its decreasing arable land. Timely and accurate prediction of maize yield in China is critical for ensuring global food security. Previous studies primarily used either visible or near-infrared (NIR) based vegetation indices (VIs), or climate data, or both to predict crop yield. However, other satellite data from different spectral bands have been underutilized, which contain unique information on crop growth and yield. In addition, although a joint application of multi-source data significantly improves crop yield prediction, the combinations of input variables that could achieve the best results have not been well investigated. Here we integrated optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield across four agro-ecological zones (AEZs) in China using a regression-based method (LASSO), two machine learning (ML) methods (RF and XGBoost), and deep learning (DL) network (LSTM). The results showed that combining multi-source data explained more than 75% of yield variation. Satellite data at the silking stage contributed more information than other variables, and solar-induced chlorophyll fluorescence (SIF) had an almost equivalent performance with the enhanced vegetation index (EVI) largely due to the low signal to noise ratio and coarse spatial resolution. The extremely high temperature and vapor pressure deficit during the reproductive period were the most important climate variables affecting maize production in China. Soil properties and management factors contained extra information on crop growth conditions that cannot be fully captured by satellite and climate data. We found that ML and DL approaches definitely outperformed regression-based methods, and ML had more computational efficiency and easier generalizations relative to DL. Our study is an important effort to combine multi-source remote sensed and environmental data for large-scale yield prediction. The proposed methodology provides a paradigm for other crop yield predictions and in other regions.

Download Full-text

A COMPARISON OF MACHINE-LEARNING REGRESSION ALGORITHMS FOR THE ESTIMATION OF LAI USING LANDSAT - 8 SATELLITE DATA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w16-679-2019 ◽

2019 ◽

Vol XLII-4/W16 ◽

pp. 679-683

Author(s):

V. P. Yadav ◽

R. Prasad ◽

R. Bala ◽

A. K. Vishwakarma ◽

S. A. Yadav ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Vegetation Index ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Area Index ◽

Global Circulation Models

Abstract. The leaf area index (LAI) is one of key variable of crops which plays important role in agriculture, ecology and climate change for global circulation models to compute energy and water fluxes. In the recent research era, the machine-learning algorithms have provided accurate computational approaches for the estimation of crops biophysical parameters using remotely sensed data. The three machine-learning algorithms, random forest regression (RFR), support vector regression (SVR) and artificial neural network regression (ANNR) were used to estimate the LAI for crops in the present study. The three different dates of Landsat-8 satellite images were used during January 2017 – March 2017 at different crops growth conditions in Varanasi district, India. The sampling regions were fully covered by major Rabi season crops like wheat, barley and mustard etc. In total pooled data, 60% samples were taken for the training of the algorithms and rest 40% samples were taken as testing and validation of the machinelearning regressions algorithms. The highest sensitivity of normalized difference vegetation index (NDVI) with LAI was found using RFR algorithms (R2 = 0.884, RMSE = 0.404) as compared to SVR (R2 = 0.847, RMSE = 0.478) and ANNR (R2 = 0.829, RMSE = 0.404). Therefore, RFR algorithms can be used for accurate estimation of LAI for crops using satellite data.

Download Full-text

Predicting Maize Yield at the Plot Scale of Different Fertilizer Systems by Multi-Source Data and Machine Learning Methods

Remote Sensing ◽

10.3390/rs13183760 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3760

Author(s):

Linghua Meng ◽

Huanjun Liu ◽

Susan L. Ustin ◽

Xinle Zhang

Keyword(s):

Machine Learning ◽

Crop Yield ◽

Satellite Data ◽

Maize Yield ◽

Environmental Data ◽

Yield Prediction ◽

Climate Data ◽

Multiple Sources ◽

Adaptive Boosting ◽

Source Data

Timely and reliable maize yield prediction is essential for the agricultural supply chain and food security. Previous studies using either climate or satellite data or both to build empirical or statistical models have prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, fertilizer information may also improve crop yield prediction, especially in regions with different fertilizer systems, such as cover crop, mineral fertilizer, or compost. Machine learning (ML) has been widely and successfully applied in crop yield prediction. Here, we attempted to predict maize yield from 1994 to 2007 at the plot scale by integrating multi-source data, including monthly climate data, satellite data (i.e., vegetation indices (VIs)), fertilizer data, and soil data to explore the accuracy of different inputs to yield prediction. The results show that incorporating all of the datasets using random forests (RF) and AB (adaptive boosting) can achieve better performances in yield prediction (R2: 0.85~0.98). In addition, the combination of VIs, climate data, and soil data (VCS) can predict maize yield more effectively than other combinations (e.g., combinations of all data and combinations of VIs and soil data). Furthermore, we also found that including different fertilizer systems had different prediction accuracies. This paper aggregates data from multiple sources and distinguishes the effects of different fertilization scenarios on crop yield predictions. In addition, the effects of different data on crop yield were analyzed in this study. Our study provides a paradigm that can be used to improve yield predictions for other crops and is an important effort that combines multi-source remotely sensed and environmental data for maize yield prediction at the plot scale and develops timely and robust methods for maize yield prediction grown under different fertilizing systems.

Download Full-text

Diachronic modeling of the population within the medieval Greater Angkor Region settlement complex

Science Advances ◽

10.1126/sciadv.abf8441 ◽

2021 ◽

Vol 7 (19) ◽

pp. eabf8441

Author(s):

Sarah Klassen ◽

Alison K. Carter ◽

Damian H. Evans ◽

Scott Ortman ◽

Miriam T. Stark ◽

...

Keyword(s):

Machine Learning ◽

Demographic History ◽

Machine Learning Algorithms ◽

Archaeological Excavation ◽

13Th Century ◽

Radiocarbon Dates ◽

Ancient Civilization ◽

Demographic Study ◽

Key Aspects ◽

The City

Angkor is one of the world’s largest premodern settlement complexes (9th to 15th centuries CE), but to date, no comprehensive demographic study has been completed, and key aspects of its population and demographic history remain unknown. Here, we combine lidar, archaeological excavation data, radiocarbon dates, and machine learning algorithms to create maps that model the development of the city and its population growth through time. We conclude that the Greater Angkor Region was home to approximately 700,000 to 900,000 inhabitants at its apogee in the 13th century CE. This granular, diachronic, paleodemographic model of the Angkor complex can be applied to any ancient civilization.

Download Full-text

Sugarcane Yield Forecast in Ivory Coast (West Africa) Based on Weather and Vegetation Index Data

Atmosphere ◽

10.3390/atmos12111459 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1459

Author(s):

Edouard Pignède ◽

Philippe Roudier ◽

Arona Diedhiou ◽

Vami Hermann N’Guessan Bi ◽

Arsène T. Kobea ◽

...

Keyword(s):

West Africa ◽

Ivory Coast ◽

Vegetation Index ◽

Cultural Practices ◽

Machine Learning Algorithms ◽

Climate Data ◽

Climate Services ◽

Climate Risks ◽

The North ◽

Sugarcane Yield

One way to use climate services in the case of sugarcane is to develop models that forecast yields to help the sector to be better prepared against climate risks. In this study, several models for forecasting sugarcane yields were developed and compared in the north of Ivory Coast (West Africa). These models were based on statistical methods, ranging from linear regression to machine learning algorithms such as the random forest method, fed by climate data (rainfall, temperature); satellite products (NDVI, EVI from MODIS Vegetation Index product) and information on cropping practices. The results show that the forecasting of sugarcane yield depended on the area considered. At the plot level, the noise due to cultivation practices can hide the effects of climate on yields and leads to poor forecasting performance. However, models using satellite variables are more efficient and those with EVI alone may explain 43% of yield variations. Moreover, taking into account cultural practices in the model improves the score and enables one to forecast 3 months before harvest in 50% and 69% of cases whether yields will be high or low, respectively, with errors of only 10% and 2%, respectively. These results on the predictive potential of sugarcane yields are useful for planning and climate risk management in this sector.

Download Full-text

Bulk Processing of Multi-Temporal Modis Data, Statistical Analyses and Machine Learning Algorithms to Understand Climate Variables in the Indian Himalayan Region

Sensors ◽

10.3390/s21217416 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7416

Author(s):

Mohd Anul Haq ◽

Prashant Baral ◽

Shivaprakash Yaragal ◽

Biswajeet Pradhan

Keyword(s):

Machine Learning ◽

Global Climate ◽

Learning Algorithms ◽

Remotely Sensed ◽

Machine Learning Algorithms ◽

Himalayan Region ◽

Lapse Rate ◽

Climate Data ◽

Modis Data ◽

Bulk Processing

Studies relating to trends of vegetation, snowfall and temperature in the north-western Himalayan region of India are generally focused on specific areas. Therefore, a proper understanding of regional changes in climate parameters over large time periods is generally absent, which increases the complexity of making appropriate conclusions related to climate change-induced effects in the Himalayan region. This study provides a broad overview of changes in patterns of vegetation, snow covers and temperature in Uttarakhand state of India through bulk processing of remotely sensed Moderate Resolution Imaging Spectroradiometer (MODIS) data, meteorological records and simulated global climate data. Additionally, regression using machine learning algorithms such as Support Vectors and Long Short-term Memory (LSTM) network is carried out to check the possibility of predicting these environmental variables. Results from 17 years of data show an increasing trend of snow-covered areas during pre-monsoon and decreasing vegetation covers during monsoon since 2001. Solar radiation and cloud cover largely control the lapse rate variations. Mean MODIS-derived land surface temperature (LST) observations are in close agreement with global climate data. Future studies focused on climate trends and environmental parameters in Uttarakhand could fairly rely upon the remotely sensed measurements and simulated climate data for the region.

Download Full-text

Cereal yield forecasting combining satellite drought-based indices, regional climate and weather data using machine learning approaches in Morocco

10.5194/egusphere-egu21-14590 ◽

2021 ◽

Author(s):

El houssaine Bouras ◽

Lionel Jarlan ◽

Salah Er-Raki ◽

Riad Balaghi ◽

Abdelhakim Amazirh ◽

...

Keyword(s):

Machine Learning ◽

Regional Climate ◽

Model Development ◽

Machine Learning Algorithms ◽

Weather Data ◽

Drought Indices ◽

Support Vector ◽

Learning Approaches ◽

Climate Data ◽

Yield Forecasting

<p>Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R&#178; = 0.90 and RMSE about 3.4 Qt.ha<sup>-1</sup>. &#160;When comparing the model&#8217;s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.</p>

Download Full-text

Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest

Environmental Research Letters ◽

10.1088/1748-9326/ab7df9 ◽

2020 ◽

Vol 15 (6) ◽

pp. 064005

Author(s):

Yanghui Kang ◽

Mutlu Ozdogan ◽

Xiaojin Zhu ◽

Zhiwei Ye ◽

Christopher Hain ◽

...

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Learning Algorithms ◽

Maize Yield ◽

Comparative Assessment ◽

Machine Learning Algorithms ◽

Yield Prediction ◽

The Us ◽

Us Midwest

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

10.20944/preprints201906.0008.v1 ◽

2019 ◽

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarría ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Linear Regression ◽

Air Temperature ◽

Satellite Data ◽

Multivariate Linear Regression ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machine and Random Forest, are compared with Multivariate Linear Regression, TVX and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using four different statistics on a daily basis allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest with residual kriging produces the best results (R$^2$=0.612 $\pm$ 0.019, NSE=0.578 $\pm$ 0.025, RMSE=1.068 $\pm$ 0.027, PBIAS=-0.172 $\pm$ 0.046), whereas TVX produces the least accurate results. The environmental conditions in the study area are not really suited to TVX, moreover this method only takes into account satellite data. On the other hand, regression methods (Support Vector Machine, Random Forest and Multivariate Linear Regression) use several parameters that are easily calculated from a Digital Elevation Model, adding very little difficulty to the use of satellite data alone. The most important variables in the Random Forest Model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text