Modelling lake trophic state: A random forest approach

10.7287/peerj.preprints.1319v3 ◽

2015 ◽

Author(s):

Jeffrey W Hollister ◽

W. Bryan Milstead ◽

Betty J. Kreakie

Keyword(s):

Water Quality ◽

Random Forests ◽

Trophic State ◽

Mean Squared Error ◽

Quality Data ◽

Full Model ◽

Water Quality Data ◽

Squared Error ◽

The Mean

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.

Download Full-text

Modelling lake trophic state: A random forest approach

10.7287/peerj.preprints.1319v2 ◽

2015 ◽

Author(s):

Jeffrey W Hollister ◽

W. Bryan Milstead ◽

Betty J. Kreakie

Keyword(s):

Water Quality ◽

Random Forests ◽

Trophic State ◽

Mean Squared Error ◽

Quality Data ◽

Full Model ◽

Water Quality Data ◽

Squared Error ◽

The Mean

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.

Download Full-text

Modelling lake trophic state: A random forest approach

10.7287/peerj.preprints.1319 ◽

2015 ◽

Cited By ~ 1

Author(s):

Jeffrey W Hollister ◽

W. Bryan Milstead ◽

Betty J. Kreakie

Keyword(s):

Water Quality ◽

Random Forests ◽

Trophic State ◽

Mean Squared Error ◽

Quality Data ◽

Full Model ◽

Water Quality Data ◽

Squared Error ◽

The Mean

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.

Download Full-text

OPTMIZATION OF BIO-OPTICAL MODEL PARAMETERS FOR TURBID LAKE WATER QUALITY ESTIMATION USING LANDSAT 8 AND WASI-2D

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w11-67-2020 ◽

2020 ◽

Vol XLII-3/W11 ◽

pp. 67-72

Author(s):

A. Manuel ◽

A. C. Blanco ◽

A. M. Tamondong ◽

R. Jalbuena ◽

O. Cabrera ◽

...

Keyword(s):

Water Quality ◽

Chlorophyll A ◽

The Philippines ◽

Quality Data ◽

Model Parameters ◽

Landsat 8 ◽

Water Quality Data ◽

Chlorophyll A Concentration ◽

In Situ Data

Abstract. Laguna Lake, the Philippines’ largest freshwater lake, has always been historically, economically, and ecologically significant to the people living near it. However, as it lies at the center of urban development in Metro Manila, it suffers from water quality degradation. Water quality sampling by current field methods is not enough to assess the spatial and temporal variations of water quality in the lake. Regular water quality monitoring is advised, and remote sensing addresses the need for a synchronized and frequent observation and provides an efficient way to obtain bio-optical water quality parameters. Optimization of bio-optical models is done as local parameters change regionally and seasonally, thus requiring calibration. Field spectral measurements and in-situ water quality data taken during simultaneous satellite overpass were used to calibrate the bio-optical modelling tool WASI-2D to get estimates of chlorophyll-a concentration from the corresponding Landsat-8 images. The initial output values for chlorophyll-a concentration, which ranges from 10–40 μg/L, has an RMSE of up to 10 μg/L when compared with in situ data. Further refinements in the initial and constant parameters of the model resulted in an improved chlorophyll-a concentration retrieval from the Landsat-8 images. The outputs provided a chlorophyll-a concentration range from 5–12 μg/L, well within the usual range of measured values in the lake, with an RMSE of 2.28 μg/L compared to in situ data.

Download Full-text

Generation of geolocated and radiometrically corrected true reflectance surfaces in the visible portion of the electromagnetic spectrum over large bodies of water using images from a sUAS

Journal of Unmanned Vehicle Systems ◽

10.1139/juvs-2019-0020 ◽

2020 ◽

Vol 8 (3) ◽

pp. 172-185

Author(s):

Juan G. Arango ◽

Brandon K. Holzbauer-Schweitzer ◽

Robert W. Nairn ◽

Robert C. Knox

Keyword(s):

Water Quality ◽

Near Infrared ◽

Quality Parameters ◽

Water Quality Parameters ◽

Quality Data ◽

Electromagnetic Spectrum ◽

Water Quality Data ◽

Ground Control Points ◽

Bodies Of Water

The focus of this study was to develop true reflectance surfaces in the visible portion of the electromagnetic spectrum from small unmanned aerial system (sUAS) images obtained over large bodies of water when no ground control points were available. The goal of the research was to produce true reflectance surfaces from which reflectance values could be extracted and used to estimate optical water quality parameters utilizing limited in-situ water quality analyses. Multispectral imagery was collected using a sUAS equipped with a multispectral sensor, capable of obtaining information in the blue (0.475 μm), green (0.560 μm), red (0.668 μm), red edge (0.717 μm), and near infrared (0.840 μm) portions of the electromagnetic spectrum. To develop a reliable and repeatable protocol, a five-step methodology was implemented: (i) image and water quality data collection, (ii) image processing, (iii) reflectance extraction, (iv) statistical interpolation, and (v) data validation. Results indicate that the created protocol generates geolocated and radiometrically corrected true reflectance surfaces from sUAS missions flown over large bodies of water. Subsequently, relationships between true reflectance values and in-situ water quality parameters were developed.

Download Full-text

A Multi-sensor Process for In-Situ Monitoring of Water Pollution in Rivers or Lakes for High-Resolution Quantitative and Qualitative Water Quality Data

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) ◽

10.1109/cse-euc-dcabes.2016.171 ◽

2016 ◽

Cited By ~ 5

Author(s):

Sukanya Randhawa ◽

Sandeep S. Sandha ◽

Biplav Srivastava

Keyword(s):

Water Pollution ◽

Water Quality ◽

High Resolution ◽

In Situ Monitoring ◽

Quality Data ◽

Water Quality Data

Download Full-text

Application of multivariate statistical techniques for investigating climate change and anthropogenic effects on surface water quality assessment: case study of Zohreh river, Hendijan, Iran

Applied Water Science ◽

10.1007/s13201-021-01399-0 ◽

2021 ◽

Vol 11 (6) ◽

Author(s):

Jalal Valiallahi ◽

Saideh Khaffaf Roudy

Keyword(s):

Water Quality ◽

Cluster Analysis ◽

Factor Analysis ◽

Monitoring Program ◽

T Test ◽

Total Variance ◽

Quality Data ◽

Water Quality Data ◽

Significant Difference ◽

The Mean

AbstractIn the present study, evaluation of spatial variations and interpretation of Zohrehh River water quality data were made by using multivariate analytical techniques including factor analysis and cluster analysis also the Arc GIS® software was used. The research method was formulated to achieve objectives herein, including field observation, numerical modeling, and laboratory analyses. The results showed that dataset consisted of 11,250 observations of seven-year monitoring program (measurement of 15 variables at 3 main stations from April 2010 to March 2017). Factor analysis with principal component analysis extraction of the dataset yielded seven varactors contributing to 82% of total variance and evaluated the incidence of each varactor on the total variance. The results of cluster analysis became complete with t-test and made water quality comparison between two clusters possible. Results of factor analysis were employed to facilitate t-test analysis. The t-test revealed the significant difference in a confidence interval of 95% between the mean of calculated varactors 1, 2, 6 and 7 between two clusters, but there was no significant difference in the mean of other varactors 3, 4 and 5 between two groups. The result shows the effect of agricultural fertilizers on stations located at downstream of the ASK dam.

Download Full-text

Predicting sediment and nutrient concentrations from high-frequency water-quality data

10.1101/599712 ◽

2019 ◽

Author(s):

Catherine Leigh ◽

Sevvandi Kandanaarachchi ◽

James M. McGree ◽

Rob J. Hyndman ◽

Omar Alsibai ◽

...

Keyword(s):

Water Quality ◽

High Frequency ◽

Water Quality Monitoring ◽

Quality Monitoring ◽

Quality Data ◽

Water Quality Data ◽

Quality Reporting ◽

River Level ◽

Manual Sampling

AbstractWater-quality monitoring in rivers often focuses on the concentrations of sediments and nutrients, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to adequately capture the variation in concentrations through time. Here, we developed models to predict total suspended solids (TSS) and oxidized nitrogen (NOx) concentrations based on high-frequency time series of turbidity, conductivity and river level data from in situ sensors in rivers flowing into the Great Barrier Reef lagoon. We fit generalized-linear mixed-effects models with continuous first-order autoregressive correlation structures to water-quality data collected by manual sampling at two freshwater sites and one estuarine site and used the fitted models to predict TSS and NOx from the in situ sensor data. These models described the temporal autocorrelation in the data and handled observations collected at irregular frequencies, characteristics typical of water-quality monitoring data. Turbidity proved a useful and generalizable surrogate of TSS, with high predictive ability in the estuarine and fresh water sites. Turbidity, conductivity and river level served as combined surrogates of NOx. However, the relationship between NOx and the covariates was more complex than that between TSS and turbidity, and consequently the ability to predict NOx was lower and less generalizable across sites than for TSS. Furthermore, prediction intervals tended to increase during events, for both TSS and NOx models, highlighting the need to include measures of uncertainty routinely in water-quality reporting. Our study also highlights that surrogate-based models used to predict sediments and nutrients need to better incorporate temporal components if variance estimates are to be unbiased and model inference meaningful. The transferability of models across sites, and potentially regions, will become increasingly important as organizations move to automated sensing for water-quality monitoring throughout catchments.

Download Full-text

ESTIMATION OF CHLOROPHYLL-A CONCENTRATION IN SAMPALOC LAKE USING UAS MULTISPECTRAL REMOTE SENSING AND REGRESSION ANALYSIS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w19-297-2019 ◽

2019 ◽

Vol XLII-4/W19 ◽

pp. 297-303

Author(s):

R. M. G. Maravilla ◽

J. P. Quinalayo ◽

A. C. Blanco ◽

C. G. Candido ◽

E. V. Gubatanga ◽

...

Keyword(s):

Water Quality ◽

Regression Analysis ◽

Chlorophyll A ◽

Linear Regression Analysis ◽

Quality Data ◽

Water Quality Data ◽

Microalgal Biomass ◽

Chlorophyll A Concentration ◽

Chl A

Abstract. Sampaloc Lake is providing livelihood for the residents through aquaculture. An increase in the quantity of fish pens inside the lake threatens its water quality condition. One parameter being monitored is microalgal biomass by measuring Chlorophyll-a concentration. This study aims to generate a chlorophyll-a concentration model for easier monitoring of the lake. In-situ water quality data were collected using chl-a data logger and water quality meter at 357 and 12 locations, respectively. Using Parrot Sequoia+ Multispectral Camera, 1496 of 2148 images were acquired and calibrated, producing 18x18cm resolution Green (G), Red(R), Red Edge (RE) and Near Infrared (NIR) reflectance images. NIR was used to mask out non-water features, and to correct sun glint. The in-situ data and the pixel values extracted were used for Simple Linear Regression Analysis. A model with 5 variables – R/NIR, RE2, NIR2, R/NIR2, and NIR/RE2, was generated, yielding an R2 of 0.586 and RMSE of 0.958 μg/l. A chlorophyll-a concentration map was produced, showing that chl-a is higher where fish pens are located and lowers as it moves away from the pens. Although there are apparent fish pens on certain areas of the lake, it still yields low chlorophyll-a because of little amount of residential area or establishments adjacent to it. Also, not all fish pens have the same concentration of Chlorophyll-a due to inconsistent population per fish pen. The center of the lake has low chlorophyll-a as it is far from human activities. The only outlet, Sabang Creek, also indicates high concentration of Chlorophyll-a.

Download Full-text

Advanced monitoring of water systems using in situ measurement stations: data validation and fault detection

Water Science & Technology ◽

10.2166/wst.2013.302 ◽

2013 ◽

Vol 68 (5) ◽

pp. 1022-1030 ◽

Cited By ~ 16

Author(s):

Janelcy Alferes ◽

Sovanna Tik ◽

John Copp ◽

Peter A. Vanrolleghem

Keyword(s):

Water Quality ◽

Treatment Plant ◽

Principal Component ◽

Quality Data ◽

Water Quality Data ◽

Data Set ◽

Data Quality Assessment ◽

Multiple Sensor ◽

Pertinent Information

In situ continuous monitoring at high frequency is used to collect water quality information about water bodies. However, it is crucial that the collected data be evaluated and validated for the appropriate interpretation of the data so as to ensure that the monitoring programme is effective. Software tools for data quality assessment with a practical orientation are proposed. As water quality data often contain redundant information, multivariate methods can be used to detect correlations, pertinent information among variables and to identify multiple sensor faults. While principal component analysis can be used to reduce the dimensionality of the original variable data set, monitoring of some statistical metrics and their violation of confidence limits can be used to detect faulty or abnormal data and can help the user apply corrective action(s). The developed algorithms are illustrated with automated monitoring systems installed in an urban river and at the inlet of a wastewater treatment plant.

Download Full-text