Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms

Li;  Li;  Li;  Liu

doi:10.3390/f10121073

Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms

Forests ◽

10.3390/f10121073 ◽

2019 ◽

Vol 10 (12) ◽

pp. 1073 ◽

Cited By ~ 10

Author(s):

Li ◽

Liu

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Variable Selection ◽

Aboveground Biomass ◽

Forest Type ◽

Learning Algorithms ◽

Forest Biomass ◽

Machine Learning Algorithms ◽

Biomass Estimation ◽

Landsat 8

Forest biomass is a major store of carbon and plays a crucial role in the regional and global carbon cycle. Accurate forest biomass assessment is important for monitoring and mapping the status of and changes in forests. However, while remote sensing-based forest biomass estimation in general is well developed and extensively used, improving the accuracy of biomass estimation remains challenging. In this paper, we used China’s National Forest Continuous Inventory data and Landsat 8 Operational Land Imager data in combination with three algorithms, either the linear regression (LR), random forest (RF), or extreme gradient boosting (XGBoost), to establish biomass estimation models based on forest type. In the modeling process, two methods of variable selection, e.g., stepwise regression and variable importance-base method, were used to select optimal variable subsets for LR and machine learning algorithms (e.g., RF and XGBoost), respectively. Comfortingly, the accuracy of models was significantly improved, and thus the following conclusions were drawn: (1) Variable selection is very important for improving the performance of models, especially for machine learning algorithms, and the influence of variable selection on XGBoost is significantly greater than that of RF. (2) Machine learning algorithms have advantages in aboveground biomass (AGB) estimation, and the XGBoost and RF models significantly improved the estimation accuracy compared with the LR models. Despite that the problems of overestimation and underestimation were not fully eliminated, the XGBoost algorithm worked well and reduced these problems to a certain extent. (3) The approach of AGB modeling based on forest type is a very advantageous method for improving the performance at the lower and higher values of AGB. Some conclusions in this paper were probably different as the study area changed. The methods used in this paper provide an optional and useful approach for improving the accuracy of AGB estimation based on remote sensing data, and the estimation of AGB was a reference basis for monitoring the forest ecosystem of the study area.

Get full-text (via PubEx)

Forest aboveground biomass estimation using Landsat 8 and Sentinel-1A data with machine learning algorithms

Scientific Reports ◽

10.1038/s41598-020-67024-3 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 3

Author(s):

Yingchang Li ◽

Mingyang Li ◽

Chao Li ◽

Zhenzhen Liu

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomass Estimation ◽

Landsat 8 ◽

Forest Aboveground Biomass

Get full-text (via PubEx)

The Effect of Synergistic Approaches of Features and Ensemble Learning Algorith on Aboveground Biomass Estimation of Natural Secondary Forests Based on ALS and Landsat 8

Sensors ◽

10.3390/s21175974 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5974

Author(s):

Chunyu Du ◽

Wenyi Fan ◽

Ye Ma ◽

Hung-Il Jin ◽

Zhen Zhen

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Aboveground Biomass ◽

Laser Scanning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomass Estimation ◽

Landsat 8 ◽

Secondary Forests ◽

Two Factors

Although the combination of Airborne Laser Scanning (ALS) data and optical imagery and machine learning algorithms were proved to improve the estimation of aboveground biomass (AGB), the synergistic approaches of different data and ensemble learning algorithms have not been fully investigated, especially for natural secondary forests (NSFs) with complex structures. This study aimed to explore the effects of the two factors on AGB estimation of NSFs based on ALS data and Landsat 8 imagery. The synergistic method of extracting novel features (i.e., COLI1 and COLI2) using optimal Landsat 8 features and the best-performing ALS feature (i.e., elevation mean) yielded higher accuracy of AGB estimation than either optical-only or ALS-only features. However, both of them failed to improve the accuracy compared to the simple combination of the untransformed features that generated them. The convolutional neural networks (CNN) model was much superior to other classic machine learning algorithms no matter of features. The stacked generalization (SG) algorithms, a kind of ensemble learning algorithms, greatly improved the accuracies compared to the corresponding base model, and the SG with the CNN meta-model performed best. This study provides technical support for a wall-to-wall AGB mapping of NSFs of northeastern China using efficient features and algorithms.

Get full-text (via PubEx)

Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020058 ◽

2021 ◽

Vol 10 (2) ◽

pp. 58

Author(s):

Muhammad Fawad Akbar Khan ◽

Khan Muhammad ◽

Shahid Bashir ◽

Shahab Ud Din ◽

Muhammad Hanif

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Kappa Coefficient ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

Sensing Data ◽

Fossiliferous Limestone

Low-resolution Geological Survey of Pakistan (GSP) maps surrounding the region of interest show oolitic and fossiliferous limestone occurrences correspondingly in Samanasuk, Lockhart, and Margalla hill formations in the Hazara division, Pakistan. Machine-learning algorithms (MLAs) have been rarely applied to multispectral remote sensing data for differentiating between limestone formations formed due to different depositional environments, such as oolitic or fossiliferous. Unlike the previous studies that mostly report lithological classification of rock types having different chemical compositions by the MLAs, this paper aimed to investigate MLAs’ potential for mapping subclasses within the same lithology, i.e., limestone. Additionally, selecting appropriate data labels, training algorithms, hyperparameters, and remote sensing data sources were also investigated while applying these MLAs. In this paper, first, oolitic (Samanasuk), fossiliferous (Lockhart and Margalla) limestone-bearing formations along with the adjoining Hazara formation were mapped using random forest (RF), support vector machine (SVM), classification and regression tree (CART), and naïve Bayes (NB) MLAs. The RF algorithm reported the best accuracy of 83.28% and a Kappa coefficient of 0.78. To further improve the targeted allochemical limestone formation map, annotation labels were generated by the fusion of maps obtained from principal component analysis (PCA), decorrelation stretching (DS), X-means clustering applied to ASTER-L1T, Landsat-8, and Sentinel-2 datasets. These labels were used to train and validate SVM, CART, NB, and RF MLAs to obtain a binary classification map of limestone occurrences in the Hazara division, Pakistan using the Google Earth Engine (GEE) platform. The classification of Landsat-8 data by CART reported 99.63% accuracy, with a Kappa coefficient of 0.99, and was in good agreement with the field validation. This binary limestone map was further classified into oolitic (Samanasuk) and fossiliferous (Lockhart and Margalla) formations by all the four MLAs; in this case, RF surpassed all the other algorithms with an improved accuracy of 96.36%. This improvement can be attributed to better annotation, resulting in a binary limestone classification map, which formed a mask for improved classification of oolitic and fossiliferous limestone in the area.

Get full-text (via PubEx)

Comparative Analysis of Machine Learning Algorithms in Automatic Identification and Extraction of Water Boundaries

Applied Sciences ◽

10.3390/app112110062 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10062

Author(s):

Aimin Li ◽

Meng Fan ◽

Guangduo Qin ◽

Youcheng Xu ◽

Hailong Wang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Water Bodies ◽

Support Vector ◽

Landsat 8 ◽

Transfer Performance ◽

Remote Sensing Images

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.

Get full-text (via PubEx)

Aboveground biomass estimation using multi-sensor data synergy and machine learning algorithms in a dense tropical forest

Applied Geography ◽

10.1016/j.apgeog.2018.05.011 ◽

2018 ◽

Vol 96 ◽

pp. 29-40 ◽

Cited By ~ 29

Author(s):

Sujit Madhab Ghosh ◽

Mukunda Dev Behera

Keyword(s):

Machine Learning ◽

Tropical Forest ◽

Aboveground Biomass ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Biomass Estimation ◽

Data Synergy

Get full-text (via PubEx)

Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060248 ◽

2019 ◽

Vol 8 (6) ◽

pp. 248 ◽

Cited By ~ 7

Author(s):

Imane Bachri ◽

Mustapha Hakdaoui ◽

Mohammed Raji ◽

Ana Cláudia Teodoro ◽

Abdelmajid Benbouziane

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Landsat 8 ◽

Landsat 8 Oli ◽

Lithological Mapping ◽

Sensing Data

Remote sensing data proved to be a valuable resource in a variety of earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification. Support vector machines (SVM) are one of the most popular MLAs with the ability to define non-linear decision boundaries in high-dimensional feature space by solving a quadratic optimization problem. This paper describes a supervised classification method considering SVM for lithological mapping in the region of Souk Arbaa Sahel belonging to the Sidi Ifni inlier, located in southern Morocco (Western Anti-Atlas). The aims of this study were (1) to refine the existing lithological map of this region, and (2) to evaluate and study the performance of the SVM approach by using combined spectral features of Landsat 8 OLI with digital elevation model (DEM) geomorphometric attributes of ALOS/PALSAR data. We performed an SVM classification method to allow the joint use of geomorphometric features and multispectral data of Landsat 8 OLI. The results indicated an overall classification accuracy of 85%. From the results obtained, we can conclude that the classification approach produced an image containing lithological units which easily identified formations such as silt, alluvium, limestone, dolomite, conglomerate, sandstone, rhyolite, andesite, granodiorite, quartzite, lutite, and ignimbrite, coinciding with those already existing on the published geological map. This result confirms the ability of SVM as a supervised learning algorithm for lithological mapping purposes.

Get full-text (via PubEx)

Modeling wetland aboveground biomass in the Poyang Lake National Nature Reserve using machine learning algorithms and Landsat-8 imagery

Journal of Applied Remote Sensing ◽

10.1117/1.jrs.12.046029 ◽

2018 ◽

Vol 12 (04) ◽

pp. 1 ◽

Cited By ~ 2

Author(s):

Rongrong Wan ◽

Peng Wang ◽

Xiaolong Wang

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Nature Reserve ◽

Poyang Lake ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

National Nature Reserve

Get full-text (via PubEx)

Estimating Forest Aboveground Biomass Using Gaofen-1 Images, Sentinel-1 Images, and Machine Learning Algorithms: A Case Study of the Dabie Mountain Region, China

Remote Sensing ◽

10.3390/rs14010176 ◽

2021 ◽

Vol 14 (1) ◽

pp. 176

Author(s):

Haoshuang Han ◽

Rongrong Wan ◽

Bing Li

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Mountain Region ◽

Stepwise Multiple Regression ◽

Support Vector ◽

Biomass Estimation ◽

Dabie Mountain ◽

Forest Aboveground Biomass

Quantitatively mapping forest aboveground biomass (AGB) is of great significance for the study of terrestrial carbon storage and global carbon cycles, and remote sensing-based data are a valuable source of estimating forest AGB. In this study, we evaluated the potential of machine learning algorithms (MLAs) by integrating Gaofen-1 (GF1) images, Sentinel-1 (S1) images, and topographic data for AGB estimation in the Dabie Mountain region, China. Variables extracted from GF1 and S1 images and digital elevation model data from sample plots were used to explain the field AGB value variations. The prediction capability of stepwise multiple regression and three MLAs, i.e., support vector machine (SVM), random forest (RF), and backpropagation neural network were compared. The results showed that the RF model achieved the highest prediction accuracy (R2 = 0.70, RMSE = 16.26 t/ha), followed by the SVM model (R2 = 0.66, RMSE = 18.03 t/ha) for the testing datasets. Some variables extracted from the GF1 images (e.g., normalized differential vegetation index, band 1-blue, the mean texture feature of band 3-red with windows of 3 × 3), S1 images (e.g., vertical transmit-horizontal receive and vertical transmit-vertical receive backscatter coefficient), and altitude had strong correlations with field AGB values (p < 0.01). Among the explanatory variables in MLAs, variables extracted from GF1 made a greater contribution to estimating forest AGB than those derived from S1 images. These results indicate the potential of the RF model for evaluating forest AGB by combining GF1 and S1, and that it could provide a reference for biomass estimation using multi-source images.

Get full-text (via PubEx)

Semantic segmentation of PolSAR image data using advanced deep learning model

Scientific Reports ◽

10.1038/s41598-021-94422-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Rajat Garg ◽

Anil Kumar ◽

Nikunj Bansal ◽

Manish Prateek ◽

Shashi Kumar

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Urban Area ◽

Urban Areas ◽

Learning Algorithms ◽

Semantic Segmentation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Deep Learning Model

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.

Get full-text (via PubEx)

Comparison of Machine Learning Algorithms for Wildland-Urban Interface Fuelbreak Planning Integrating ALS and UAV-Borne LiDAR Data and Multispectral Images

Drones ◽

10.3390/drones4020021 ◽

2020 ◽

Vol 4 (2) ◽

pp. 21 ◽

Cited By ~ 1

Author(s):

Francisco Rodríguez-Puerta ◽

Rafael Alonso Ponce ◽

Fernando Pérez-Rodríguez ◽

Beatriz Águeda ◽

Saray Martín-García ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Random Forest ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Data Sources ◽

Lidar Data ◽

Sensing Data ◽

Sentinel 2

Controlling vegetation fuels around human settlements is a crucial strategy for reducing fire severity in forests, buildings and infrastructure, as well as protecting human lives. Each country has its own regulations in this respect, but they all have in common that by reducing fuel load, we in turn reduce the intensity and severity of the fire. The use of Unmanned Aerial Vehicles (UAV)-acquired data combined with other passive and active remote sensing data has the greatest performance to planning Wildland-Urban Interface (WUI) fuelbreak through machine learning algorithms. Nine remote sensing data sources (active and passive) and four supervised classification algorithms (Random Forest, Linear and Radial Support Vector Machine and Artificial Neural Networks) were tested to classify five fuel-area types. We used very high-density Light Detection and Ranging (LiDAR) data acquired by UAV (154 returns·m−2 and ortho-mosaic of 5-cm pixel), multispectral data from the satellites Pleiades-1B and Sentinel-2, and low-density LiDAR data acquired by Airborne Laser Scanning (ALS) (0.5 returns·m−2, ortho-mosaic of 25 cm pixels). Through the Variable Selection Using Random Forest (VSURF) procedure, a pre-selection of final variables was carried out to train the model. The four algorithms were compared, and it was concluded that the differences among them in overall accuracy (OA) on training datasets were negligible. Although the highest accuracy in the training step was obtained in SVML (OA=94.46%) and in testing in ANN (OA=91.91%), Random Forest was considered to be the most reliable algorithm, since it produced more consistent predictions due to the smaller differences between training and testing performance. Using a combination of Sentinel-2 and the two LiDAR data (UAV and ALS), Random Forest obtained an OA of 90.66% in training and of 91.80% in testing datasets. The differences in accuracy between the data sources used are much greater than between algorithms. LiDAR growth metrics calculated using point clouds in different dates and multispectral information from different seasons of the year are the most important variables in the classification. Our results support the essential role of UAVs in fuelbreak planning and management and thus, in the prevention of forest fires.

Get full-text (via PubEx)