Estimation of Forest Above-Ground Biomass by Geographically Weighted Regression and Machine Learning with Sentinel Imagery

Accurate forest above-ground biomass (AGB) is crucial for sustaining forest management and mitigating climate change to support REDD+ (reducing emissions from deforestation and forest degradation, plus the sustainable management of forests, and the conservation and enhancement of forest carbon stocks) processes. Recently launched Sentinel imagery offers a new opportunity for forest AGB mapping and monitoring. In this study, texture characteristics and backscatter coefficients of Sentinel-1, in addition to multispectral bands, vegetation indices, and biophysical variables of Sentinal-2, based on 56 measured AGB samples in the center of the Changbai Mountains, China, were used to develop biomass prediction models through geographically weighted regression (GWR) and machine learning (ML) algorithms, such as the artificial neural network (ANN), support vector machine for regression (SVR), and random forest (RF). The results showed that texture characteristics and vegetation biophysical variables were the most important predictors. SVR was the best method for predicting and mapping the patterns of AGB in the study site with limited samples, whose mean error, mean absolute error, root mean square error, and correlation coefficient were 4 × 10−3, 0.07, 0.08 Mg·ha−1, and 1, respectively. Predicted values of AGB from four models ranged from 11.80 to 324.12 Mg·ha−1, and those for broadleaved deciduous forests were the most accurate, while those for AGB above 160 Mg·ha−1 were the least accurate. The study demonstrated encouraging results in forest AGB mapping of the normal vegetated area using the freely accessible and high-resolution Sentinel imagery, based on ML techniques.

Download Full-text

Comparison of Capability of SAR and Optical Data in Mapping Forest above Ground Biomass Based on Machine Learning

Environmental Sciences Proceedings ◽

10.3390/iecg2020-07916 ◽

2020 ◽

Vol 5 (1) ◽

pp. 13

Author(s):

Negar Tavasoli ◽

Hossein Arefi

Keyword(s):

Machine Learning ◽

Carbon Stock ◽

Vegetation Indices ◽

Forest Biomass ◽

Principal Component ◽

Optical Data ◽

Above Ground Biomass ◽

Ground Biomass ◽

Texture Characteristics ◽

Sentinel 2

Assessment of forest above ground biomass (AGB) is critical for managing forest and understanding the role of forest as source of carbon fluxes. Recently, satellite remote sensing products offer the chance to map forest biomass and carbon stock. The present study focuses on comparing the potential use of combination of ALOSPALSAR and Sentinel-1 SAR data, with Sentinel-2 optical data to estimate above ground biomass and carbon stock using Genetic-Random forest machine learning (GA-RF) algorithm. Polarimetric decompositions, texture characteristics and backscatter coefficients of ALOSPALSAR and Sentinel-1, and vegetation indices, tasseled cap, texture parameters and principal component analysis (PCA) of Sentinel-2 based on measured AGB samples were used to estimate biomass. The overall coefficient (R2) of AGB modelling using combination of ALOSPALSAR and Sentinel-1 data, and Sentinel-2 data were respectively 0.70 and 0.62. The result showed that Combining ALOSPALSAR and Sentinel-1 data to predict AGB by using GA-RF model performed better than Sentinel-2 data.

Download Full-text

Prediction of above ground biomass and C-stocks based on UAV-LiDAR,multispectral imagery and machine learning methods.

10.5194/egusphere-egu21-15708 ◽

2021 ◽

Author(s):

Jaime Caballer Revenga ◽

Katerina Trepekli ◽

Stefan Oehmcke ◽

Fabian Gieseke ◽

Rasmus Jensen ◽

...

Keyword(s):

Machine Learning ◽

Near Field ◽

Multispectral Image ◽

Successful Outcome ◽

Observation System ◽

Support Vector ◽

Multispectral Imagery ◽

Above Ground Biomass ◽

Ground Biomass ◽

C Stocks

<p>Current efforts to enhance the understanding of global carbon (C) cycle rely on novel monitoring campaigns of C sequestration in terrestrial ecosystems.The successful outcome of such efforts will be relevant to sectors ranging from climate change and land use studies (global scale) to precision agriculture and land management consultancy (local scale).To that end, current investigations apply recently developed scientific instrumentation - e.g. &#160;Light detection and Ranging (LiDAR) - &#160;and computational methods - e.g. Machine Learning (ML). Near-field remote sensing - i.e. &#160;Unmanned Aerial Vehicle (UAV)-LiDAR -, can provide high resolution LiDAR data, increasing the monitoring accuracy of C stocks estimates and biophysical variables at the ecosystem scale. In contrast to previous approaches (e.g. image-derived vegetation indices), UAV-LiDAR provides a true 3D description of the canopy vertical structure. In order to evaluate the potential of new approaches towards precise C stock quantification in an agricultural field of Denmark (13 ha.), using near-field remote sensed data, we compare the results based on using 3D canopy metrics - derived from UAV-LiDAR - against the well-established multispectral image based metrics. Then, the performance of six different machine learning (ML) models &#160;- two Random Forest variations, KNN, AdaBoost, ElasticNet, Support Vector, and Linear regression - designed to predict above ground biomass (AGB) based on a set of features derived from (i) UAV-LiDAR point cloud data (PCD), and (ii) multispectral imagery is evaluated. Their prediction quality are tested against unseen data from the same species, and sampling campaigns. Also, the sources of uncertainty are assessed as well as the importance of each predicting feature. The field work was conducted within the footprint of an Integrated Carbon Observation System (ICOS) class 1 station site, facilitating ecosystem traits monitoring in real time. The aerial and biomass sampling campaigns have been operated at 15-days frequency during the crops' growing period, in which, simultaneously, UAV-LiDAR and multispectral image data as well as ground truth biomass data were collected. By means of laboratory analysis, C and nutrient content in the crops' biomass was also determined. Based on arithmetic and morphological methods, the PCD were pre-processed to remove noise and classify them to ground and vegetation points. By means of the methods described, we demonstrate that UAV-LiDAR combined with multispectral data and ML methods can be used to accurately estimate AGB, 3D ecosystem structure as well as C-stocks in agricultural ecosystems.&#160;</p>

Download Full-text

Above-Ground Biomass Estimation in Oats Using UAV Remote Sensing and Machine Learning

Sensors ◽

10.3390/s22020601 ◽

2022 ◽

Vol 22 (2) ◽

pp. 601

Author(s):

Prakriti Sharma ◽

Larry Leigh ◽

Jiyul Chang ◽

Maitiniyazi Maimaitijiang ◽

Melanie Caffé

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Vegetation Indices ◽

Support Vector ◽

Biomass Estimation ◽

Surface Model ◽

Above Ground Biomass ◽

Ground Biomass ◽

South Shore ◽

Oat Biomass

Current strategies for phenotyping above-ground biomass in field breeding nurseries demand significant investment in both time and labor. Unmanned aerial vehicles (UAV) can be used to derive vegetation indices (VIs) with high throughput and could provide an efficient way to predict forage yield with high accuracy. The main objective of the study is to investigate the potential of UAV-based multispectral data and machine learning approaches in the estimation of oat biomass. UAV equipped with a multispectral sensor was flown over three experimental oat fields in Volga, South Shore, and Beresford, South Dakota, USA, throughout the pre- and post-heading growth phases of oats in 2019. A variety of vegetation indices (VIs) derived from UAV-based multispectral imagery were employed to build oat biomass estimation models using four machine-learning algorithms: partial least squares (PLS), support vector machine (SVM), Artificial neural network (ANN), and random forest (RF). The results showed that several VIs derived from the UAV collected images were significantly positively correlated with dry biomass for Volga and Beresford (r = 0.2–0.65), however, in South Shore, VIs were either not significantly or weakly correlated with biomass. For Beresford, approximately 70% of the variance was explained by PLS, RF, and SVM validation models using data collected during the post-heading phase. Likewise for Volga, validation models had lower coefficient of determination (R2 = 0.20–0.25) and higher error (RMSE = 700–800 kg/ha) than training models (R2 = 0.50–0.60; RMSE = 500–690 kg/ha). In South Shore, validation models were only able to explain approx. 15–20% of the variation in biomass, which is possibly due to the insignificant correlation values between VIs and biomass. Overall, this study indicates that airborne remote sensing with machine learning has potential for above-ground biomass estimation in oat breeding nurseries. The main limitation was inconsistent accuracy in model prediction across locations. Multiple-year spectral data, along with the inclusion of textural features like crop surface model (CSM) derived height and volumetric indicators, should be considered in future studies while estimating biophysical parameters like biomass.

Download Full-text

Estimating Mangrove Above-Ground Biomass Using Extreme Gradient Boosting Decision Trees Algorithm with Fused Sentinel-2 and ALOS-2 PALSAR-2 Data in Can Gio Biosphere Reserve, Vietnam

Remote Sensing ◽

10.3390/rs12050777 ◽

2020 ◽

Vol 12 (5) ◽

pp. 777 ◽

Cited By ~ 9

Author(s):

Tien Dat Pham ◽

Nga Nhu Le ◽

Nam Thang Ha ◽

Luong Viet Nguyen ◽

Junshi Xia ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Biosphere Reserve ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Above Ground Biomass ◽

Ground Biomass ◽

Extreme Gradient Boosting ◽

Sentinel 2

This study investigates the effectiveness of gradient boosting decision trees techniques in estimating mangrove above-ground biomass (AGB) at the Can Gio biosphere reserve (Vietnam). For this purpose, we employed a novel gradient-boosting regression technique called the extreme gradient boosting regression (XGBR) algorithm implemented and verified a mangrove AGB model using data from a field survey of 121 sampling plots conducted during the dry season. The dataset fuses the data of the Sentinel-2 multispectral instrument (MSI) and the dual polarimetric (HH, HV) data of ALOS-2 PALSAR-2. The performance standards of the proposed model (root-mean-square error (RMSE) and coefficient of determination (R2)) were compared with those of other machine learning techniques, namely gradient boosting regression (GBR), support vector regression (SVR), Gaussian process regression (GPR), and random forests regression (RFR). The XGBR model obtained a promising result with R2 = 0.805, RMSE = 28.13 Mg ha−1, and the model yielded the highest predictive performance among the five machine learning models. In the XGBR model, the estimated mangrove AGB ranged from 11 to 293 Mg ha−1 (average = 106.93 Mg ha−1). This work demonstrates that XGBR with the combined Sentinel-2 and ALOS-2 PALSAR-2 data can accurately estimate the mangrove AGB in the Can Gio biosphere reserve. The general applicability of the XGBR model combined with multiple sourced optical and SAR data should be further tested and compared in a large-scale study of forest AGBs in different geographical and climatic ecosystems.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

Machine Learning-Based Prediction of Air Quality

Applied Sciences ◽

10.3390/app10249151 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9151

Author(s):

Yun-Chia Liang ◽

Yona Maimury ◽

Angela Hsiang-Ling Chen ◽

Josue Rodolfo Cuevas Juarez

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Prediction Models ◽

Superior Performance ◽

Support Vector ◽

Economic Activities ◽

Adaptive Boosting ◽

Series Of Experiments ◽

Artificial Neural Network Ann

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text