A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

AbstractWater-quality monitoring in rivers often focuses on the concentrations of sediments and nutrients, constituents that can smother biota and cause eutrophication. However, the physical and economic constraints of manual sampling prohibit data collection at the frequency required to adequately capture the variation in concentrations through time. Here, we developed models to predict total suspended solids (TSS) and oxidized nitrogen (NOx) concentrations based on high-frequency time series of turbidity, conductivity and river level data from in situ sensors in rivers flowing into the Great Barrier Reef lagoon. We fit generalized-linear mixed-effects models with continuous first-order autoregressive correlation structures to water-quality data collected by manual sampling at two freshwater sites and one estuarine site and used the fitted models to predict TSS and NOx from the in situ sensor data. These models described the temporal autocorrelation in the data and handled observations collected at irregular frequencies, characteristics typical of water-quality monitoring data. Turbidity proved a useful and generalizable surrogate of TSS, with high predictive ability in the estuarine and fresh water sites. Turbidity, conductivity and river level served as combined surrogates of NOx. However, the relationship between NOx and the covariates was more complex than that between TSS and turbidity, and consequently the ability to predict NOx was lower and less generalizable across sites than for TSS. Furthermore, prediction intervals tended to increase during events, for both TSS and NOx models, highlighting the need to include measures of uncertainty routinely in water-quality reporting. Our study also highlights that surrogate-based models used to predict sediments and nutrients need to better incorporate temporal components if variance estimates are to be unbiased and model inference meaningful. The transferability of models across sites, and potentially regions, will become increasingly important as organizations move to automated sensing for water-quality monitoring throughout catchments.

Download Full-text

A Feature‐Based Procedure for Detecting Technical Outliers in Water‐Quality Data From In Situ Sensors

Water Resources Research ◽

10.1029/2019wr024906 ◽

2019 ◽

Vol 55 (11) ◽

pp. 8547-8568 ◽

Cited By ~ 1

Author(s):

Priyanga Dilini Talagala ◽

Rob J. Hyndman ◽

Catherine Leigh ◽

Kerrie Mengersen ◽

Kate Smith‐Miles

Keyword(s):

Water Quality ◽

Quality Data ◽

Water Quality Data ◽

In Situ Sensors ◽

Feature Based

Download Full-text

Identification of long-term trends and seasonality in high-frequency water quality data from the Yangtze River basin, China

PLoS ONE ◽

10.1371/journal.pone.0188889 ◽

2018 ◽

Vol 13 (2) ◽

pp. e0188889 ◽

Cited By ~ 22

Author(s):

Weili Duan ◽

Bin He ◽

Yaning Chen ◽

Shan Zou ◽

Yi Wang ◽

...

Keyword(s):

Water Quality ◽

High Frequency ◽

Yangtze River ◽

Yangtze River Basin ◽

Quality Data ◽

The Yangtze River ◽

Water Quality Data ◽

The Yangtze River Basin ◽

Long Term Trends

Download Full-text

OPTMIZATION OF BIO-OPTICAL MODEL PARAMETERS FOR TURBID LAKE WATER QUALITY ESTIMATION USING LANDSAT 8 AND WASI-2D

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w11-67-2020 ◽

2020 ◽

Vol XLII-3/W11 ◽

pp. 67-72

Author(s):

A. Manuel ◽

A. C. Blanco ◽

A. M. Tamondong ◽

R. Jalbuena ◽

O. Cabrera ◽

...

Keyword(s):

Water Quality ◽

Chlorophyll A ◽

The Philippines ◽

Quality Data ◽

Model Parameters ◽

Landsat 8 ◽

Water Quality Data ◽

Chlorophyll A Concentration ◽

In Situ Data

Abstract. Laguna Lake, the Philippines’ largest freshwater lake, has always been historically, economically, and ecologically significant to the people living near it. However, as it lies at the center of urban development in Metro Manila, it suffers from water quality degradation. Water quality sampling by current field methods is not enough to assess the spatial and temporal variations of water quality in the lake. Regular water quality monitoring is advised, and remote sensing addresses the need for a synchronized and frequent observation and provides an efficient way to obtain bio-optical water quality parameters. Optimization of bio-optical models is done as local parameters change regionally and seasonally, thus requiring calibration. Field spectral measurements and in-situ water quality data taken during simultaneous satellite overpass were used to calibrate the bio-optical modelling tool WASI-2D to get estimates of chlorophyll-a concentration from the corresponding Landsat-8 images. The initial output values for chlorophyll-a concentration, which ranges from 10–40 μg/L, has an RMSE of up to 10 μg/L when compared with in situ data. Further refinements in the initial and constant parameters of the model resulted in an improved chlorophyll-a concentration retrieval from the Landsat-8 images. The outputs provided a chlorophyll-a concentration range from 5–12 μg/L, well within the usual range of measured values in the lake, with an RMSE of 2.28 μg/L compared to in situ data.

Download Full-text

Generation of geolocated and radiometrically corrected true reflectance surfaces in the visible portion of the electromagnetic spectrum over large bodies of water using images from a sUAS

Journal of Unmanned Vehicle Systems ◽

10.1139/juvs-2019-0020 ◽

2020 ◽

Vol 8 (3) ◽

pp. 172-185

Author(s):

Juan G. Arango ◽

Brandon K. Holzbauer-Schweitzer ◽

Robert W. Nairn ◽

Robert C. Knox

Keyword(s):

Water Quality ◽

Near Infrared ◽

Quality Parameters ◽

Water Quality Parameters ◽

Quality Data ◽

Electromagnetic Spectrum ◽

Water Quality Data ◽

Ground Control Points ◽

Bodies Of Water

The focus of this study was to develop true reflectance surfaces in the visible portion of the electromagnetic spectrum from small unmanned aerial system (sUAS) images obtained over large bodies of water when no ground control points were available. The goal of the research was to produce true reflectance surfaces from which reflectance values could be extracted and used to estimate optical water quality parameters utilizing limited in-situ water quality analyses. Multispectral imagery was collected using a sUAS equipped with a multispectral sensor, capable of obtaining information in the blue (0.475 μm), green (0.560 μm), red (0.668 μm), red edge (0.717 μm), and near infrared (0.840 μm) portions of the electromagnetic spectrum. To develop a reliable and repeatable protocol, a five-step methodology was implemented: (i) image and water quality data collection, (ii) image processing, (iii) reflectance extraction, (iv) statistical interpolation, and (v) data validation. Results indicate that the created protocol generates geolocated and radiometrically corrected true reflectance surfaces from sUAS missions flown over large bodies of water. Subsequently, relationships between true reflectance values and in-situ water quality parameters were developed.

Download Full-text

A Multi-sensor Process for In-Situ Monitoring of Water Pollution in Rivers or Lakes for High-Resolution Quantitative and Qualitative Water Quality Data

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) ◽

10.1109/cse-euc-dcabes.2016.171 ◽

2016 ◽

Cited By ~ 5

Author(s):

Sukanya Randhawa ◽

Sandeep S. Sandha ◽

Biplav Srivastava

Keyword(s):

Water Pollution ◽

Water Quality ◽

High Resolution ◽

In Situ Monitoring ◽

Quality Data ◽

Water Quality Data

Download Full-text

Modelling lake trophic state: A random forest approach

10.7287/peerj.preprints.1319v3 ◽

2015 ◽

Author(s):

Jeffrey W Hollister ◽

W. Bryan Milstead ◽

Betty J. Kreakie

Keyword(s):

Water Quality ◽

Random Forests ◽

Trophic State ◽

Mean Squared Error ◽

Quality Data ◽

Full Model ◽

Water Quality Data ◽

Squared Error ◽

The Mean

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.

Download Full-text

Modelling sub-daily phytoplankton dynamics and analysing primary production controls in the lower Thames catchment, UK

10.5194/egusphere-egu2020-8010 ◽

2020 ◽

Author(s):

Devanshi Pathak ◽

Michael Hutchins ◽

François Edwards

Keyword(s):

Water Quality ◽

Primary Production ◽

Water Temperature ◽

High Frequency ◽

Phytoplankton Growth ◽

Quality Data ◽

Phytoplankton Dynamics ◽

Water Quality Data ◽

River Thames ◽

The Impact

<p>River phytoplankton provide food for primary consumers, and are a major source of oxygen in many rivers. However, high phytoplankton concentrations can hamper river water quality and ecosystem functioning, making it crucial to predict and prevent harmful phytoplankton growth in rivers. In this study, we modify an existing mechanistic water quality model to simulate sub-daily changes in water quality, and present its application in the River Thames catchment. So far, the modelling studies in the River Thames have focused on daily to weekly time-steps, and have shown limited predictive ability in modelling phytoplankton concentrations. With the availability of high-frequency water quality data, modelling tools can be improved to better understand process interactions for phytoplankton growth in dynamic rivers. The modified model in this study uses high-frequency water quality data along a 62 km stretch in the lower Thames to simulate river flows, water temperature, nutrients, and phytoplankton concentrations at sub-daily time-steps for 2013-14. Model performance is judged by percentage error in mean and Nash-Sutcliffe Efficiency (NSE) statistics. The model satisfactorily simulates the observed diurnal variability and transport of phytoplankton concentrations within the river stretch, with NSE values greater than 0.7 at all calibration sites. Phytoplankton blooms develop within an optimum range of flows (16-81 m<sup>3</sup>/s) and temperature (11-18&#176; C), and are largely influenced by phytoplankton growth and death rate parameters. We find that phytoplankton growth in the lower Thames is mainly limited by physical controls such as residence time, light, and water temperature, and show some nutrient limitation arising from phosphorus depletion in summer. The model is tested under different future scenarios to evaluate the impact of changes in climate and management conditions on primary production and its controls. Our findings provide support for the argument that the sub-daily modelling of phytoplankton is a step forward in better prediction and management of phytoplankton dynamics in river systems.</p>

Download Full-text

Identification of phosphorus export from low-runoff yielding areas using combined application of high frequency water quality data and MODHMS modelling

The Science of The Total Environment ◽

10.1016/j.scitotenv.2012.03.021 ◽

2012 ◽

Vol 426 ◽

pp. 264-271 ◽

Cited By ~ 16

Author(s):

Michael J. Donn ◽

Olga V. Barron ◽

Anthony D. Barr

Keyword(s):

Water Quality ◽

High Frequency ◽

Quality Data ◽

Water Quality Data ◽

Combined Application ◽

Phosphorus Export

Download Full-text

Modelling lake trophic state: A random forest approach

10.7287/peerj.preprints.1319v1 ◽

2015 ◽

Author(s):

Jeffrey W Hollister ◽

W. Bryan Milstead ◽

Betty J. Kreakie

Keyword(s):

Water Quality ◽

Random Forests ◽

Trophic State ◽

Mean Squared Error ◽

Quality Data ◽

Full Model ◽

Water Quality Data ◽

Squared Error ◽

The Mean

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.

Download Full-text