An Evaluation of Eight Machine Learning Regression Algorithms for Forest Aboveground Biomass Estimation from Multiple Satellite Data Products

This study provided a comprehensive evaluation of eight machine learning regression algorithms for forest aboveground biomass (AGB) estimation from satellite data based on leaf area index, canopy height, net primary production, and tree cover data, as well as climatic and topographical data. Some of these algorithms have not been commonly used for forest AGB estimation such as the extremely randomized trees, stochastic gradient boosting, and categorical boosting (CatBoost) regression. For each algorithm, its hyperparameters were optimized using grid search with cross-validation, and the optimal AGB model was developed using the training dataset (80%) and AGB was predicted on the test dataset (20%). Performance metrics, feature importance as well as overestimation and underestimation were considered as indicators for evaluating the performance of an algorithm. To reduce the impacts of the random training-test data split and sampling method on the performance, the above procedures were repeated 50 times for each algorithm under the random sampling, the stratified sampling, and separate modeling scenarios. The results showed that five tree-based ensemble algorithms performed better than the three nonensemble algorithms (multivariate adaptive regression splines, support vector regression, and multilayer perceptron), and the CatBoost algorithm outperformed the other algorithms for AGB estimation. Compared with the random sampling scenario, the stratified sampling scenario and separate modeling did not significantly improve the AGB estimates, but modeling AGB for each forest type separately provided stable results in terms of the contributions of the predictor variables to the AGB estimates. All the algorithms showed forest AGB were underestimated when the AGB values were larger than 210 Mg/ha and overestimated when the AGB values were less than 120 Mg/ha. This study highlighted the capability of ensemble algorithms to improve AGB estimates and the necessity of improving AGB estimates for high and low AGB levels in future studies.

Download Full-text

Estimating Forest Aboveground Biomass Using Gaofen-1 Images, Sentinel-1 Images, and Machine Learning Algorithms: A Case Study of the Dabie Mountain Region, China

Remote Sensing ◽

10.3390/rs14010176 ◽

2021 ◽

Vol 14 (1) ◽

pp. 176

Author(s):

Haoshuang Han ◽

Rongrong Wan ◽

Bing Li

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Mountain Region ◽

Stepwise Multiple Regression ◽

Support Vector ◽

Biomass Estimation ◽

Dabie Mountain ◽

Forest Aboveground Biomass

Quantitatively mapping forest aboveground biomass (AGB) is of great significance for the study of terrestrial carbon storage and global carbon cycles, and remote sensing-based data are a valuable source of estimating forest AGB. In this study, we evaluated the potential of machine learning algorithms (MLAs) by integrating Gaofen-1 (GF1) images, Sentinel-1 (S1) images, and topographic data for AGB estimation in the Dabie Mountain region, China. Variables extracted from GF1 and S1 images and digital elevation model data from sample plots were used to explain the field AGB value variations. The prediction capability of stepwise multiple regression and three MLAs, i.e., support vector machine (SVM), random forest (RF), and backpropagation neural network were compared. The results showed that the RF model achieved the highest prediction accuracy (R2 = 0.70, RMSE = 16.26 t/ha), followed by the SVM model (R2 = 0.66, RMSE = 18.03 t/ha) for the testing datasets. Some variables extracted from the GF1 images (e.g., normalized differential vegetation index, band 1-blue, the mean texture feature of band 3-red with windows of 3 × 3), S1 images (e.g., vertical transmit-horizontal receive and vertical transmit-vertical receive backscatter coefficient), and altitude had strong correlations with field AGB values (p < 0.01). Among the explanatory variables in MLAs, variables extracted from GF1 made a greater contribution to estimating forest AGB than those derived from S1 images. These results indicate the potential of the RF model for evaluating forest AGB by combining GF1 and S1, and that it could provide a reference for biomass estimation using multi-source images.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Interactive comment on “Improving maps of forest aboveground biomass: A combined approach using machine learning with a spatial statistical model” by Shaoqing Dai et al.

10.5194/bg-2020-36-sc2 ◽

2020 ◽

Author(s):

Wenli Huang

Keyword(s):

Machine Learning ◽

Statistical Model ◽

Aboveground Biomass ◽

Combined Approach ◽

Spatial Statistical Model ◽

Forest Aboveground Biomass

Download Full-text

ESTIMATION OF REGIONAL FOREST ABOVEGROUND BIOMASS COMBINING ICESAT-GLAS WAVEFORMS AND HJ-1A/HSI HYPERSPECTRAL IMAGERIES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-731-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 731-737

Author(s):

Yanqiu Xing ◽

Sai Qiu ◽

Jianhua Ding ◽

Jing Tian

Keyword(s):

Carbon Cycle ◽

Aboveground Biomass ◽

Forest Canopy ◽

Field Investigation ◽

Support Vector ◽

Laser Altimeter ◽

Spectral Bands ◽

Spectral Imager ◽

Regional Forest ◽

Forest Aboveground Biomass

Estimation of forest aboveground biomass (AGB) is a critical challenge for understanding the global carbon cycle because it dominates the dynamics of the terrestrial carbon cycle. Light Detection and Ranging (LiDAR) system has a unique capability for estimating accurately forest canopy height, which has a direct relationship and can provide better understanding to the forest AGB. The Geoscience Laser Altimeter System (GLAS) onboard the Ice, Cloud, and land Elevation Satellite (ICESat) is the first polarorbiting LiDAR instrument for global observations of Earth, and it has been widely used for extracting forest AGB with footprints of nominally 70 m in diameter on the earth's surface. However, the GLAS footprints are discrete geographically, and thus it has been restricted to produce the regional full coverage of forest AGB. To overcome the limit of discontinuity, the Hyper Spectral Imager (HSI) of HJ-1A with 115 bands was combined with GLAS waveforms to predict the regional forest AGB in the study. Corresponding with the field investigation in Wangqing of Changbai Mountain, China, the GLAS waveform metrics were derived and employed to establish the AGB model, which was used further for estimating the AGB within GLAS footprints. For HSI imagery, the Minimum Noise Fraction (MNF) method was used to decrease noise and reduce the dimensionality of spectral bands, and consequently the first three of MNF were able to offer almost 98% spectral information and qualified to regress with the GLAS estimated AGB. Afterwards, the support vector regression (SVR) method was employed in the study to establish the relationship between GLAS estimated AGB and three of HSI MNF (i.e. MNF1, MNF2 and MNF3), and accordingly the full covered regional forest AGB map was produced. The results showed that the adj.R2 and RMSE of SVR-AGB models were 0.75 and 4.68 t hm−2 for broadleaf forests, 0.73 and 5.39 t hm−2 for coniferous forests and 0.71 and 6.15 t hm−2 for mixed forests respectively. The full covered regional forest AGB map of the study area had 0.62 of accuracy and 11.11 t hm−2 of RMSE. The study demonstrated that it holds great potential to achieve the full covered regional forest AGB distribution with higher accuracy by combing LiDAR data and hyperspectral imageries.

Download Full-text

Modeling of Aboveground Biomass with Landsat 8 OLI and Machine Learning in Temperate Forests

Forests ◽

10.3390/f11010011 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11

Author(s):

Pablito M. López-Serrano ◽

José Luis Cárdenas Domínguez ◽

José Javier Corral-Rivas ◽

Enrique Jiménez ◽

Carlos A. López-Sánchez ◽

...

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Goodness Of Fit ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Sensing Applications ◽

Learning Techniques ◽

Physical Variables ◽

Selection Of

An accurate estimation of forests’ aboveground biomass (AGB) is required because of its relevance to the carbon cycle, and because of its economic and ecological importance. The selection of appropriate variables from satellite information and physical variables is important for precise AGB prediction mapping. Because of the complex relationships for AGB prediction, non-parametric machine-learning techniques represent potentially useful techniques for AGB estimation, but their use and comparison in forest remote-sensing applications is still relatively limited. The objective of the present study was to evaluate the performance of automatic learning techniques, support vector regression (SVR) and random forest (RF), to predict the observed AGB (from 318 permanent sampling plots) from the Landsat 8 Landsat 8 Operational Land Imager (OLI) sensor, spectral indexes, texture indexes and physical variables the Sierra Madre Occidental in Mexico. The result showed that the best SVR model explained 80% of the total variance (root mean square error (RMSE) = 8.20 Mg ha−1). The variables that best predicted AGB, in order of importance, were the bands that belong to the region of red and near and middle infrared, and the average temperature. The results show that the SVR technique has a good potential for the estimation of the AGB and that the selection of the model hyperparameters has important implications for optimizing the goodness of fit.

Download Full-text

Predicting Benzene Concentration Using Machine Learning and Time Series Algorithms

Mathematics ◽

10.3390/math8122205 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2205

Author(s):

Luis Alfonso Menéndez García ◽

Fernando Sánchez Lasheras ◽

Paulino José García Nieto ◽

Laura Álvarez de Prado ◽

Antonio Bernardo Sánchez

Keyword(s):

Machine Learning ◽

Time Series ◽

Moving Average ◽

Environmental Pollutants ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Learning Models ◽

Vector Autoregressive ◽

Benzene Concentration ◽

Machine Learning Models

Benzene is a pollutant which is very harmful to our health, so models are necessary to predict its concentration and relationship with other air pollutants. The data collected by eight stations in Madrid (Spain) over nine years were analyzed using the following regression-based machine learning models: multivariate linear regression (MLR), multivariate adaptive regression splines (MARS), multilayer perceptron neural network (MLP), support vector machines (SVM), autoregressive integrated moving-average (ARIMA) and vector autoregressive moving-average (VARMA) models. Benzene concentration predictions were made from the concentration of four environmental pollutants: nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM10) and toluene (C7H8), and the performance measures of the model were studied from the proposed models. In general, regression-based machine learning models are more effective at predicting than time series models.

Download Full-text

Latest Advances in Fractional Snow Cover Mapping on MODIS Data by Machine Learning Algorithms

10.5194/egusphere-egu2020-13193 ◽

2020 ◽

Author(s):

Semih Kuter ◽

Zuhal Akyurek

Keyword(s):

Machine Learning ◽

Snow Cover ◽

General Circulation ◽

Snow Water Equivalent ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Landsat 8 ◽

European Alps ◽

Fractional Snow Cover

Spatial extent of snow has been declared as an essential climate variable. Accurate modeling of snow cover is crucial for the better prediction of snow water equivalent and, consequently, for the success of general circulation and weather forecasting models as well as climate change and hydrological studies. This presentation mainly focuses on the representation of the latest findings of our efforts in fractional snow cover mapping on MODIS images by data-driven machine learning methodologies. For this purpose, a dataset composed of 20 MODIS - Landsat 8 image pairs acquired between Apr 2013 and Dec 2016 over European Alps were employed. Artificial neural networks (ANN), multivariate adaptive regression splines (MARS), support vector regression (SVR) and random forest (RF) models were trained and tested by using reference FSC maps generated from higher spatial resolution Landsat 8 binary snow maps. ANN, MARS, SVR and RF models exhibited quite good performance with average R &#8776; 0.93, whereas the agreement between the reference FSC maps and the MODIS&#8217; own product MOD10A1 (C5) was slightly poorer with R &#8776; 0.88.

Download Full-text

Improving Accuracy Estimation of Forest Aboveground Biomass Based on Incorporation of ALOS-2 PALSAR-2 and Sentinel-2A Imagery and Machine Learning: A Case Study of the Hyrcanian Forest Area (Iran)

Remote Sensing ◽

10.3390/rs10020172 ◽

2018 ◽

Vol 10 (2) ◽

pp. 172 ◽

Cited By ~ 70

Author(s):

Sasan Vafaei ◽

Javad Soosani ◽

Kamran Adeli ◽

Hadi Fadaei ◽

Hamed Naghavi ◽

...

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Forest Area ◽

Hyrcanian Forest ◽

Accuracy Estimation ◽

Sentinel 2A ◽

Improving Accuracy ◽

Forest Aboveground Biomass

Download Full-text

A COMPARISON OF MACHINE-LEARNING REGRESSION ALGORITHMS FOR THE ESTIMATION OF LAI USING LANDSAT - 8 SATELLITE DATA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w16-679-2019 ◽

2019 ◽

Vol XLII-4/W16 ◽

pp. 679-683

Author(s):

V. P. Yadav ◽

R. Prasad ◽

R. Bala ◽

A. K. Vishwakarma ◽

S. A. Yadav ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Vegetation Index ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Area Index ◽

Global Circulation Models

Abstract. The leaf area index (LAI) is one of key variable of crops which plays important role in agriculture, ecology and climate change for global circulation models to compute energy and water fluxes. In the recent research era, the machine-learning algorithms have provided accurate computational approaches for the estimation of crops biophysical parameters using remotely sensed data. The three machine-learning algorithms, random forest regression (RFR), support vector regression (SVR) and artificial neural network regression (ANNR) were used to estimate the LAI for crops in the present study. The three different dates of Landsat-8 satellite images were used during January 2017 – March 2017 at different crops growth conditions in Varanasi district, India. The sampling regions were fully covered by major Rabi season crops like wheat, barley and mustard etc. In total pooled data, 60% samples were taken for the training of the algorithms and rest 40% samples were taken as testing and validation of the machinelearning regressions algorithms. The highest sensitivity of normalized difference vegetation index (NDVI) with LAI was found using RFR algorithms (R2 = 0.884, RMSE = 0.404) as compared to SVR (R2 = 0.847, RMSE = 0.478) and ANNR (R2 = 0.829, RMSE = 0.404). Therefore, RFR algorithms can be used for accurate estimation of LAI for crops using satellite data.

Download Full-text

Machine learning model for rice yield prediction using KNN regression.

10.31220/agrirxiv.2021.00070 ◽

2021 ◽

Author(s):

Akhil Wilson ◽

Raji Sukumar ◽

N. Hemalatha

Keyword(s):

Machine Learning ◽

Rice Yield ◽

Support Vector ◽

Yield Prediction ◽

Challenging Problem ◽

Random Forest Regression ◽

Machine Learning Model ◽

Smart Farming ◽

Regression Algorithms ◽

The One

Abstract The prediction of agriculture yield is the one of the challenging problem in smart farming, we have predicted the yield of rice in the state of Kerala, India with the help of Machine Learning by considering the soil properties, micro climatic condition and area of the rice. Here we have used Decision Tree Regression, Random Forest Regression, Linear Regression, K Nearest Neighbour Regression, Xgboost Regression and Support Vector Regression algorithms in order to predict the rice yield. From the experiments we got KNN regression to be the best with 98.77% accuracy.

Download Full-text