Machine Learning Algorithms for Chromophoric Dissolved Organic Matter (CDOM) Estimation Based on Landsat 8 Images

Chromophoric dissolved organic matter (CDOM) is crucial in the biogeochemical cycle and carbon cycle of aquatic environments. However, in inland waters, remotely sensed estimates of CDOM remain challenging due to the low optical signal of CDOM and complex optical conditions. Therefore, developing efficient, practical and robust models to estimate CDOM absorption coefficient in inland waters is essential for successful water environment monitoring and management. We examined and improved different machine learning algorithms using extensive CDOM measurements and Landsat 8 images covering different trophic states to develop the robust CDOM estimation model. The algorithms were evaluated via 111 Landsat 8 images and 1708 field measurements covering CDOM light absorption coefficient a(254) from 2.64 to 34.04 m−1. Overall, the four machine learning algorithms achieved more than 70% accuracy for CDOM absorption coefficient estimation. Based on model training, validation and the application on Landsat 8 OLI images, we found that the Gaussian process regression (GPR) had higher stability and estimation accuracy (R2 = 0.74, mean relative error (MRE) = 22.2%) than the other models. The estimation accuracy and MRE were R2 = 0.75 and MRE = 22.5% for backpropagation (BP) neural network, R2 = 0.71 and MRE = 24.4% for random forest regression (RFR) and R2 = 0.71 and MRE = 24.4% for support vector regression (SVR). In contrast, the best three empirical models had estimation accuracies of R2 less than 0.56. The model accuracies applied to Landsat images of Lake Qiandaohu (oligo-mesotrophic state) were better than those of Lake Taihu (eutrophic state) because of the more complex optical conditions in eutrophic lakes. Therefore, machine learning algorithms have great potential for CDOM monitoring in inland waters based on large datasets. Our study demonstrates that machine learning algorithms are available to map CDOM spatial-temporal patterns in inland waters.

Download Full-text

Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020058 ◽

2021 ◽

Vol 10 (2) ◽

pp. 58

Author(s):

Muhammad Fawad Akbar Khan ◽

Khan Muhammad ◽

Shahid Bashir ◽

Shahab Ud Din ◽

Muhammad Hanif

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Kappa Coefficient ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

Sensing Data ◽

Fossiliferous Limestone

Low-resolution Geological Survey of Pakistan (GSP) maps surrounding the region of interest show oolitic and fossiliferous limestone occurrences correspondingly in Samanasuk, Lockhart, and Margalla hill formations in the Hazara division, Pakistan. Machine-learning algorithms (MLAs) have been rarely applied to multispectral remote sensing data for differentiating between limestone formations formed due to different depositional environments, such as oolitic or fossiliferous. Unlike the previous studies that mostly report lithological classification of rock types having different chemical compositions by the MLAs, this paper aimed to investigate MLAs’ potential for mapping subclasses within the same lithology, i.e., limestone. Additionally, selecting appropriate data labels, training algorithms, hyperparameters, and remote sensing data sources were also investigated while applying these MLAs. In this paper, first, oolitic (Samanasuk), fossiliferous (Lockhart and Margalla) limestone-bearing formations along with the adjoining Hazara formation were mapped using random forest (RF), support vector machine (SVM), classification and regression tree (CART), and naïve Bayes (NB) MLAs. The RF algorithm reported the best accuracy of 83.28% and a Kappa coefficient of 0.78. To further improve the targeted allochemical limestone formation map, annotation labels were generated by the fusion of maps obtained from principal component analysis (PCA), decorrelation stretching (DS), X-means clustering applied to ASTER-L1T, Landsat-8, and Sentinel-2 datasets. These labels were used to train and validate SVM, CART, NB, and RF MLAs to obtain a binary classification map of limestone occurrences in the Hazara division, Pakistan using the Google Earth Engine (GEE) platform. The classification of Landsat-8 data by CART reported 99.63% accuracy, with a Kappa coefficient of 0.99, and was in good agreement with the field validation. This binary limestone map was further classified into oolitic (Samanasuk) and fossiliferous (Lockhart and Margalla) formations by all the four MLAs; in this case, RF surpassed all the other algorithms with an improved accuracy of 96.36%. This improvement can be attributed to better annotation, resulting in a binary limestone classification map, which formed a mask for improved classification of oolitic and fossiliferous limestone in the area.

Download Full-text

Parameterization of the light absorption properties of chromophoric dissolved organic matter in the Baltic Sea and Pomeranian lakes

Ocean Science ◽

10.5194/os-12-1013-2016 ◽

2016 ◽

Vol 12 (4) ◽

pp. 1013-1032 ◽

Cited By ~ 12

Author(s):

Justyna Meler ◽

Piotr Kowalczuk ◽

Mirosława Ostrowska ◽

Dariusz Ficek ◽

Monika Zabłocka ◽

...

Keyword(s):

Organic Matter ◽

Dissolved Organic Matter ◽

Absorption Coefficient ◽

Baltic Sea ◽

Chlorophyll A ◽

Chromophoric Dissolved Organic Matter ◽

The Baltic Sea ◽

Absorption Properties ◽

Chlorophyll A Concentration ◽

The Baltic

Abstract. This study presents three alternative models for estimating the absorption properties of chromophoric dissolved organic matter aCDOM(λ). For this analysis we used a database containing 556 absorption spectra measured in 2006–2009 in different regions of the Baltic Sea (open and coastal waters, the Gulf of Gdańsk and the Pomeranian Bay), at river mouths, in the Szczecin Lagoon and also in three lakes in Pomerania (Poland) – Obłęskie, Łebsko and Chotkowskie. The variability range of the chromophoric dissolved organic matter (CDOM) absorption coefficient at 400 nm, aCDOM(400), lay within 0.15–8.85 m−1. The variability in aCDOM(λ) was parameterized with respect to the variability over 3 orders of magnitude in the chlorophyll a concentration Chl a (0.7–119 mg m−3). The chlorophyll a concentration and aCDOM(400) were correlated, and a statistically significant, nonlinear empirical relationship between these parameters was derived (R2 =  0.83). On the basis of the covariance between these parameters, we derived two empirical mathematical models that enabled us to design the CDOM absorption coefficient dynamics in natural waters and reconstruct the complete CDOM absorption spectrum in the UV and visible spectral domains. The input variable in the first model was the chlorophyll a concentration, and in the second one it was aCDOM(400). Both models were fitted to a power function, and a second-order polynomial function was used as the exponent. Regression coefficients for these formulas were determined for wavelengths from 240 to 700 nm at 5 nm intervals. Both approximations reflected the real shape of the absorption spectra with a low level of uncertainty. Comparison of these approximations with other models of light absorption by CDOM demonstrated that our parameterizations were superior (bias from −1.45 to 62 %, RSME from 22 to 220 %) for estimating CDOM absorption in the optically complex waters of the Baltic Sea and Pomeranian lakes.

Download Full-text

Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms

Forests ◽

10.3390/f10121073 ◽

2019 ◽

Vol 10 (12) ◽

pp. 1073 ◽

Cited By ~ 10

Author(s):

Li ◽

Liu

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Variable Selection ◽

Aboveground Biomass ◽

Forest Type ◽

Learning Algorithms ◽

Forest Biomass ◽

Machine Learning Algorithms ◽

Biomass Estimation ◽

Landsat 8

Forest biomass is a major store of carbon and plays a crucial role in the regional and global carbon cycle. Accurate forest biomass assessment is important for monitoring and mapping the status of and changes in forests. However, while remote sensing-based forest biomass estimation in general is well developed and extensively used, improving the accuracy of biomass estimation remains challenging. In this paper, we used China’s National Forest Continuous Inventory data and Landsat 8 Operational Land Imager data in combination with three algorithms, either the linear regression (LR), random forest (RF), or extreme gradient boosting (XGBoost), to establish biomass estimation models based on forest type. In the modeling process, two methods of variable selection, e.g., stepwise regression and variable importance-base method, were used to select optimal variable subsets for LR and machine learning algorithms (e.g., RF and XGBoost), respectively. Comfortingly, the accuracy of models was significantly improved, and thus the following conclusions were drawn: (1) Variable selection is very important for improving the performance of models, especially for machine learning algorithms, and the influence of variable selection on XGBoost is significantly greater than that of RF. (2) Machine learning algorithms have advantages in aboveground biomass (AGB) estimation, and the XGBoost and RF models significantly improved the estimation accuracy compared with the LR models. Despite that the problems of overestimation and underestimation were not fully eliminated, the XGBoost algorithm worked well and reduced these problems to a certain extent. (3) The approach of AGB modeling based on forest type is a very advantageous method for improving the performance at the lower and higher values of AGB. Some conclusions in this paper were probably different as the study area changed. The methods used in this paper provide an optional and useful approach for improving the accuracy of AGB estimation based on remote sensing data, and the estimation of AGB was a reference basis for monitoring the forest ecosystem of the study area.

Download Full-text

A COMPARISON OF MACHINE-LEARNING REGRESSION ALGORITHMS FOR THE ESTIMATION OF LAI USING LANDSAT - 8 SATELLITE DATA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w16-679-2019 ◽

2019 ◽

Vol XLII-4/W16 ◽

pp. 679-683

Author(s):

V. P. Yadav ◽

R. Prasad ◽

R. Bala ◽

A. K. Vishwakarma ◽

S. A. Yadav ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Vegetation Index ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Area Index ◽

Global Circulation Models

Abstract. The leaf area index (LAI) is one of key variable of crops which plays important role in agriculture, ecology and climate change for global circulation models to compute energy and water fluxes. In the recent research era, the machine-learning algorithms have provided accurate computational approaches for the estimation of crops biophysical parameters using remotely sensed data. The three machine-learning algorithms, random forest regression (RFR), support vector regression (SVR) and artificial neural network regression (ANNR) were used to estimate the LAI for crops in the present study. The three different dates of Landsat-8 satellite images were used during January 2017 – March 2017 at different crops growth conditions in Varanasi district, India. The sampling regions were fully covered by major Rabi season crops like wheat, barley and mustard etc. In total pooled data, 60% samples were taken for the training of the algorithms and rest 40% samples were taken as testing and validation of the machinelearning regressions algorithms. The highest sensitivity of normalized difference vegetation index (NDVI) with LAI was found using RFR algorithms (R2 = 0.884, RMSE = 0.404) as compared to SVR (R2 = 0.847, RMSE = 0.478) and ANNR (R2 = 0.829, RMSE = 0.404). Therefore, RFR algorithms can be used for accurate estimation of LAI for crops using satellite data.

Download Full-text

Comparative Analysis of Machine Learning Algorithms in Automatic Identification and Extraction of Water Boundaries

Applied Sciences ◽

10.3390/app112110062 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10062

Author(s):

Aimin Li ◽

Meng Fan ◽

Guangduo Qin ◽

Youcheng Xu ◽

Hailong Wang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Water Bodies ◽

Support Vector ◽

Landsat 8 ◽

Transfer Performance ◽

Remote Sensing Images

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.

Download Full-text

Evaluation of Machine Learning Algorithms for Surface Water Delineation Using Landsat 8 Images

Journal of Advanced Research in Dynamical and Control Systems ◽

10.5373/jardcs/v12i3/20201184 ◽

2020 ◽

Vol 12 (3) ◽

pp. 207-216

Author(s):

Bijeesh T.V.

Keyword(s):

Machine Learning ◽

Surface Water ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Landsat 8

Download Full-text

Satellite retrieval of the absorption coefficient of chromophoric dissolved organic matter in continental margins

Journal of Geophysical Research Atmospheres ◽

10.1029/95jc02561 ◽

1995 ◽

Vol 100 (C12) ◽

pp. 24847 ◽

Cited By ~ 29

Author(s):

Frank E. Hoge ◽

Mark E. Williams ◽

Robert N. Swift ◽

James K. Yungel ◽

Anthony Vodacek

Keyword(s):

Organic Matter ◽

Dissolved Organic Matter ◽

Absorption Coefficient ◽

Chromophoric Dissolved Organic Matter ◽

Continental Margins ◽

Satellite Retrieval

Download Full-text

Chromophoric dissolved organic matter in inland waters: Present knowledge and future challenges

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.143550 ◽

2021 ◽

Vol 759 ◽

pp. 143550

Author(s):

Yunlin Zhang ◽

Lei Zhou ◽

Yongqiang Zhou ◽

Liuqing Zhang ◽

Xiaolong Yao ◽

...

Keyword(s):

Organic Matter ◽

Dissolved Organic Matter ◽

Present Knowledge ◽

Chromophoric Dissolved Organic Matter ◽

Inland Waters ◽

Future Challenges

Download Full-text

Evaluation of Machine Learning Algorithms for Surface Water Extraction in a Landsat 8 Scene of Nepal

Sensors ◽

10.3390/s19122769 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2769 ◽

Cited By ~ 8

Author(s):

Tri Dev Acharya ◽

Anoj Subedi ◽

Dong Ha Lee

Keyword(s):

Machine Learning ◽

Surface Water ◽

Recursive Partitioning ◽

Learning Algorithms ◽

High Elevation ◽

Machine Learning Algorithms ◽

Water Extraction ◽

Support Vector ◽

Landsat 8 ◽

Water Index

With over 6000 rivers and 5358 lakes, surface water is one of the most important resources in Nepal. However, the quantity and quality of Nepal’s rivers and lakes are decreasing due to human activities and climate change. Despite the advancement of remote sensing technology and the availability of open access data and tools, the monitoring and surface water extraction works has not been carried out in Nepal. Single or multiple water index methods have been applied in the extraction of surface water with satisfactory results. Extending our previous study, the authors evaluated six different machine learning algorithms: Naive Bayes (NB), recursive partitioning and regression trees (RPART), neural networks (NNET), support vector machines (SVM), random forest (RF), and gradient boosted machines (GBM) to extract surface water in Nepal. With three secondary bands, slope, NDVI and NDWI, the algorithms were evaluated for performance with the addition of extra information. As a result, all the applied machine learning algorithms, except NB and RPART, showed good performance. RF showed overall accuracy (OA) and kappa coefficient (Kappa) of 1 for the all the multiband data with the reference dataset, followed by GBM, NNET, and SVM in metrics. The performances were better in the hilly regions and flat lands, but not well in the Himalayas with ice, snow and shadows, and the addition of slope and NDWI showed improvement in the results. Adding single secondary bands is better than adding multiple in most algorithms except NNET. From current and previous studies, it is recommended to separate any study area with and without snow or low and high elevation, then apply machine learning algorithms in original Landsat data or with the addition of slopes or NDWI for better performance.

Download Full-text