Remote sensing inversion of water quality in coastal sea area based on machine learning: a case study of Shenzhen bay, China

Author(s):  
Xiaotong Zhu ◽  
Jinhui Jeanne Huang

<p>Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R<sup>2</sup> = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R<sup>2</sup> = 0.75 and 0.78 respectively) while DL performs better in TDS (R<sup>2</sup> =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.</p>

2020 ◽  
Author(s):  
Yu Li ◽  
Youyue Sun ◽  
Jinhui Jeanne Huang ◽  
Edward McBean

<p>With the increasingly prominent ecological and environmental problems in lakes, the monitoring water quality in lakes by satellite remote sensing is becoming more and more high demanding. Traditional water quality sampling is normally conducted manually and are time-consuming and labor-costly. It could not provide a full picture of the waterbodies over time due to limited sampling points and low sampling frequency. A novel attempt is proposed to use hyperspectral remote sensing in conjunction with machine learning technologies to retrieve water quality parameters and provide mapping for these parameters in a lake. The retrieval of both optically active parameters: Chlorophyll-a (CHLA) and dissolved oxygen concentration (DO), as well as non-optically active parameters: total phosphorous (TP), total nitrogen (TN), turbidity (TB), pH were studied in this research. A comparison of three machine learning algorithms including Random Forests (RF), Support Vector Regression (SVR) and Artificial Neural Networks were conducted. These water parameters collected by the Environment and Climate Change Canada agency for 20 years were used as the ground truth for model training and validation. Two set of remote sensing data from MODIS and Sentinel-2 were utilized and evaluated. This research proposed a new approach to retrieve both optically active parameters and non-optically active parameters for water body and provide new strategy for water quality monitoring.</p>


2022 ◽  
Vol 4 ◽  
Author(s):  
Matthew D. Stocker ◽  
Yakov A. Pachepsky ◽  
Robert L. Hill

The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml−1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.


2020 ◽  
Vol 12 (10) ◽  
pp. 1586
Author(s):  
Leonardo F. Arias-Rodriguez ◽  
Zheng Duan ◽  
Rodrigo Sepúlveda ◽  
Sergio I. Martinez-Martinez ◽  
Markus Disse

Remote-sensing-based machine learning approaches for water quality parameters estimation, Secchi Disk Depth (SDD) and Turbidity, were developed for the Valle de Bravo reservoir in central Mexico. This waterbody is a multipurpose reservoir, which provides drinking water to the metropolitan area of Mexico City. To reveal the water quality status of inland waters in the last decade, evaluation of MERIS imagery is a substantial approach. This study incorporated in-situ collected measurements across the reservoir and remote sensing reflectance data from the Medium Resolution Imaging Spectrometer (MERIS). Machine learning approaches with varying complexities were tested, and the optimal model for SDD and Turbidity was determined. Cross-validation demonstrated that the satellite-based estimates are consistent with the in-situ measurements for both SDD and Turbidity, with R2 values of 0.81 to 0.86 and RMSE of 0.15 m and 0.95 nephelometric turbidity units (NTU). The best model was applied to time series of MERIS images to analyze the spatial and temporal variations of the reservoir’s water quality from 2002 to 2012. Derived analysis revealed yearly patterns caused by dry and rainy seasons and several disruptions were identified. The reservoir varied from trophic to intermittent hypertrophic status, while SDD ranged from 0–1.93 m and Turbidity up to 23.70 NTU. Results suggest the effects of drought events in the years 2006 and 2009 on water quality were correlated with water quality detriment. The water quality displayed slow recovery through 2011–2012. This study demonstrates the usefulness of satellite observations for supporting inland water quality monitoring and water management in this region.


2021 ◽  
Vol 11 (21) ◽  
pp. 10062
Author(s):  
Aimin Li ◽  
Meng Fan ◽  
Guangduo Qin ◽  
Youcheng Xu ◽  
Hailong Wang

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.


2018 ◽  
Vol 10 (10) ◽  
pp. 1522 ◽  
Author(s):  
Gina Leonita ◽  
Monika Kuffer ◽  
Richard Sliuzas ◽  
Claudio Persello

The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators set. By relying on such inconsistent maps, it will be difficult to monitor the national slum upgrading program’s progress. Remote sensing imagery combined with machine learning algorithms could support the reduction of these inconsistencies. This study evaluates the performance of two machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF), for slum mapping in support of the slum mapping campaign in Bandung, Indonesia. Recognizing the complexity in differentiating slum and formal areas in Indonesia, the study used a combination of spectral, contextual, and morphological features. In addition, sequential feature selection (SFS) combined with the Hilbert–Schmidt independence criterion (HSIC) was used to select significant features for classifying slums. Overall, the highest accuracy (88.5%) was achieved by the SVM with SFS using contextual, morphological, and spectral features, which is higher than the estimated accuracy of the SBSM. To evaluate the potential of machine learning-based slum mapping (MLBSM) in support of slum upgrading programs, interviews were conducted with several local and national stakeholders. Results show that local acceptance for a remote sensing-based slum mapping approach varies among stakeholder groups. Therefore, a locally adapted framework is required to combine ground surveys with robust and consistent machine learning methods, for being able to deal with big data, and to allow the rapid extraction of consistent information on the dynamics of slums at a large scale.


2021 ◽  
Vol 13 (22) ◽  
pp. 4662
Author(s):  
Zhi Qiao ◽  
Siyang Sun ◽  
Qun’ou Jiang ◽  
Ling Xiao ◽  
Yunqi Wang ◽  
...  

Some essential water conservation areas in China have continuously suffered from various serious problems such as water pollution and water quality deterioration in recent decades and thus called for real-time water pollution monitoring system underwater resources management. On the basis of the remote sensing data and ground monitoring data, this study firstly constructed a more accurate retrieval model for total phosphorus (TP) concentration by comparing 12 machine learning algorithms, including support vector machine (SVM), artificial neural network (ANN), Bayesian ridge regression (BRR), lasso regression (Lasso), elastic net (EN), linear regression (LR), decision tree regressor (DTR), K neighbor regressor (KNR), random forest regressor (RFR), extra trees regressor (ETR), AdaBoost regressor (ABR) and gradient boosting regressor (GBR). Then, this study applied the constructed retrieval model to explore the spatial-temporal evolution of the Miyun Reservoir and finally assessed the water quality. The results showed that the model of TP concentration built by the ETR algorithm had the best accuracy, with the coefficient R2 reaching over 85% and the mean absolute error lower than 0.000433. The TP concentration in Miyun Reservoir was between 0.0380 and 0.1298 mg/L, and there was relatively significant spatial and temporal heterogeneity. It changed remarkably during the periods of the flood season, winter tillage, planting, and regreening, and it was lower in summer than in other seasons. Moreover, the TP in the southwest part of the reservoir was generally lower than in the northeast, as there was less human activities interference. According to the Environmental Quality Standard for the surface water environment, the water quality of Miyun Reservoir was overall safe, except only for an over-standard case occurrence in the spring and September. These conclusions can provide a significant scientific reference for water quality monitoring and management in Miyun Reservoir.


Author(s):  
Xiaohang Li ◽  
Jianli Ding ◽  
Nurmemet Ilyas

Abstract Surface water quality is an important factor affecting the ecological environment and human living environment. The monitoring of surface water quality by remote sensing monitoring technology can provide important research significance for water resources protection and water quality evaluation. Finding the optimal spectral index sensitive to water quality for remote sensing monitoring of water quality is extremely important for surface water quality analysis and treatment in the Ebinur Lake Basin in arid areas. This study used Sentinel-2MSI data at 10 m resolution to quickly monitor the water quality of the watershed. Through laboratory experiments and measurement data from the Ebinur Lake Basin, 22 WQPs were obtained. Through Z-score and redundancy analysis, 9 WQPs with significant contributions were extracted. Based on the remote sensing spectral band, 4 water indexes (NDWI, NWI, EWI, AWEI-nsh) and 2D modeling spectral index(DI, RI, NDI), the correlation analysis between WQPs and two kinds of spectral band indexes is carried out, and it is concluded that the overall correlation between WQP and 2D spectral modeling is more relevant. Calculate the evaluation and model the 2D spectrum of the Water Quality Index (WQI). The WQI is predicted and modeled through 4 machine learning algorithms (RF, SVM, PLSR, PLSR-SVM).The results show that the inversion effect of the two-dimensional spectral modeling index on water quality parameters (WQPs) is superior to that of the water index, and the correlation coefficient of the DI (R12-R1) SWIR-2 and BLUE band interpolation index reaches 0.787. On this basis, three kinds of two-dimensional spectral modeling indexes are used to inversely synthesize the WQI, and the correlation coefficient of the ratio index of the RI (R11/R8) SWIR-1 and NIR bands is preferably 0.69. In the WQI prediction, the partial least squares regression support vector machine (PLSR-SVM) model in machine learning algorithms has good modeling and prediction effects (R2c = 0.873, R2v = 0.87), which can provide a good basis. The research results provide references for remote monitoring of surface water in arid areas, and provide a basis for water quality prediction and safety evaluation.


2018 ◽  
Author(s):  
Nazmul Hossain ◽  
Fumihiko Yokota ◽  
Akira Fukuda ◽  
Ashir Ahmed

BACKGROUND Predictive analytics through machine learning has been extensively using across industries including eHealth and mHealth for analyzing patient’s health data, predicting diseases, enhancing the productivity of technology or devices used for providing healthcare services and so on. However, not enough studies were conducted to predict the usage of eHealth by rural patients in developing countries. OBJECTIVE The objective of this study is to predict rural patients’ use of eHealth through supervised machine learning algorithms and propose the best-fitted model after evaluating their performances in terms of predictive accuracy. METHODS Data were collected between June and July 2016 through a field survey with structured questionnaire form 292 randomly selected rural patients in a remote North-Western sub-district of Bangladesh. Four supervised machine learning algorithms namely logistic regression, boosted decision tree, support vector machine, and artificial neural network were chosen for this experiment. A ‘correlation-based feature selection’ technique was applied to include the most relevant but not redundant features into the model. A 10-fold cross-validation technique was applied to reduce bias and over-fitting of the data. RESULTS Logistic regression outperformed other three algorithms with 85.9% predictive accuracy, 86.4% precision, 90.5% recall, 88.1% F-score, and AUC of 91.5% followed by neural network, decision tree and support vector machine with the accuracy rate of 84.2%, 82.9 %, and 80.4% respectively. CONCLUSIONS The findings of this study are expected to be helpful for eHealth practitioners in selecting appropriate areas to serve and dealing with both under-capacity and over-capacity by predicting the patients’ response in advance with a certain level of accuracy and precision.


Sign in / Sign up

Export Citation Format

Share Document