scholarly journals Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach

2021 ◽  
Vol 13 (11) ◽  
pp. 6318
Author(s):  
Rafael Rodríguez ◽  
Marcos Pastorini ◽  
Lorena Etcheverry ◽  
Christian Chreties ◽  
Mónica Fossati ◽  
...  

The monitoring of surface-water quality followed by water-quality modeling and analysis are essential for generating effective strategies in surface-water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implement univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR) and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered “satisfactory” (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than those positioned along the mainstream. IDW was the model with the best imputation results, followed by RFR, HR and SVR. The approach proposed in this study is expected to aid water-resource researchers and managers in augmenting water-quality datasets and overcoming the missing data issue to increase the number of future studies related to the water-quality matter.

Author(s):  
Rafael Rodriguez ◽  
Marcos Pastorini ◽  
Lorena Etcheverry ◽  
Christian Chreties ◽  
Mónica Fossati ◽  
...  

The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered satisfactory (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than the ones positioned along the mainstream. IDW was the most chosen model for data imputation.


2018 ◽  
Vol 69 (8) ◽  
pp. 2045-2049
Author(s):  
Catalina Gabriela Gheorghe ◽  
Andreea Bondarev ◽  
Ion Onutu

Monitoring of environmental factors allows the achievement of some important objectives regarding water quality, forecasting, warning and intervention. The aim of this paper is to investigate water quality parameters in some potential pollutant sources from northern, southern and east-southern areas of Romania. Surface water quality data for some selected chemical parameters were collected and analyzed at different points from March to May 2017.


2017 ◽  
Vol 21 (2) ◽  
pp. 949-961 ◽  
Author(s):  
Hang Zheng ◽  
Yang Hong ◽  
Di Long ◽  
Hua Jing

Abstract. Surface water quality monitoring (SWQM) provides essential information for water environmental protection. However, SWQM is costly and limited in terms of equipment and sites. The global popularity of social media and intelligent mobile devices with GPS and photography functions allows citizens to monitor surface water quality. This study aims to propose a method for SWQM using social media platforms. Specifically, a WeChat-based application platform is built to collect water quality reports from volunteers, which have been proven valuable for water quality monitoring. The methods for data screening and volunteer recruitment are discussed based on the collected reports. The proposed methods provide a framework for collecting water quality data from citizens and offer a primary foundation for big data analysis in future research.


Sign in / Sign up

Export Citation Format

Share Document