Gradient Boosting Machine for Phosphorus Removal Prediction in Multi-Soil-Layering (MSL) system operated in a rural area 

Author(s):  
Sofyan Sbahi ◽  
Naaila Ouazzani ◽  
Abderrahmane Lahrouni ◽  
Abdessamed Hejjaj ◽  
Laila Mandi

<p>The quality of effluents from wastewater treatment plants still challenging especially in underprivileged rural areas where water resources are mostly affected by pollution, depletion and excessive exploitation. Thus, the prediction of phosphorus removal is one of the most important tasks in the management of wastewater effluent. Predictive model accuracy is crucial for safe reuse of treated water for public health and the environment. However, linear models that use a high dimensional dataset may be unable to build accurate and interpretable models. To address this complexity, the current study evaluates the effect of hydraulic retention time (HRT) on the removal of orthophosphates (PO<sub>4</sub>–P) and total phosphorus (TP) by the multi-soil-layering (MSL) eco-friendly technology. In addition, it attempts to predict this removal from domestic wastewater using a combined approach based on feature selection technique and gradient boosting machine algorithm (GBM). Sixteen physicochemical and bacterial indicators were monitored for a one-year period. The results show that the HRT impact significantly (p < 0.01) the removal of phosphorus content by the MSL system. The HRT, pH, PO<sub>4</sub>–P and TP were suggested relevant for predicting the removal of TP, while HRT and PO<sub>4</sub>–P were sufficient for predicting the removal rate of PO<sub>4</sub>–P. The analysis of accuracy using the validation dataset demonstrates that GBM models have high credibility as they achieve an R² > 0.92, while the analysis of sensitivity reveals that the HRT was the most important factor affecting phosphorus removal in the MSL system. In addition, the modeling results show that the GBM model has proven to be useful for predicting pollutant removal in the MSL technology and investigating its behavior.</p><p> </p>

2018 ◽  
Vol 38 ◽  
pp. 01009 ◽  
Author(s):  
Lingwei Kong ◽  
Lu Wang ◽  
Yi Zhang ◽  
Rongwu Mei ◽  
Yu Zhang

In this study, a new coupling system of biological filter bed and subsurface-flow constructed wetland based on the self-ventilation network was proposed, and the comparative pollutant removal efficiency at low and high influent concentration of the pilot coupling system with different substrates configurations were investigated. The study found that: The comparison system (b) had better removal rates than that of the original system (a), and the removal rate when treating low influent concentration was 74.10%, 94.14%, 73.57% and 69.53%, while in high influent concentration case was 81.30%, 90.28%, 88.57% and 75.36% for CODCr , NH4+ -N, TN and TP, respectively. The removal of the above main water indexes of the comparison system (b) promoted by 11.00%, 11.55%, 2.69% and 8.09% respectively in low influent concentration case and 4.20%, 9.20%, 7.66% and 13.61% respectively in high influent concentration case when comparing to the original system (a), which showed that the optimized configuration of various kinds of substrates was significant and was more beneficial to the degradation and removal of pollutants. The adsorption and interception function of substrates in the constructed wetland was the main way of phosphorus removal. The function of self-ventilation ensured the amount of DO in the coupling system, making the phosphorus removal was less affected comparing to structure of traditional wetland.


Author(s):  
Lorena Peñacoba-Antona ◽  
Montserrat Gómez-Delgado ◽  
Abraham Esteve-Núñez

METland is a new variety of Constructed Wetland (CW) for treating wastewater where gravel is replaced by a biocompatible electroconductive material to stimulate the metabolism of electroactive bacteria. The system requires a remarkably low land footprint (0.4 m2/pe) compared to conventional CW, due to the high pollutant removal rate exhibited by such microorganisms. In order to predict the optimal locations for METland, a methodology based on Multi-Criteria Evaluation (MCE) techniques applied to Geographical Information Systems (GIS) has been proposed. Seven criteria were evaluated and weighted in the context of Analytical Hierarchy Process (AHP). Finally, a Global Sensitivity Analysis (GSA) was performed using the Sobol method for resource optimization. The model was tested in two locations, oceanic and Mediterranean, to prove its feasibility in different geographical, demographic and climate conditions. The GSA revealed as conclusion the most influential factors in the model: (i) land use, (ii) distance to population centers, and (iii) distance to river beds. Interestingly, the model could predict best suitable locations by reducing the number of analyzed factors to just such three key factors (responsible for 78% of the output variance). The proposed methodology will help decision-making stakeholders in implementing nature-based solutions, including constructed wetlands, for treating wastewater in rural areas.


2021 ◽  
Author(s):  
Bruno C. Perez ◽  
Marco C.A.M. Bink ◽  
Gary A. Churchill ◽  
Karen L. Svenson ◽  
Mario P.L. Calus

Recent literature suggests machine learning methods can capture interactions between loci and therefore could outperform linear models when predicting traits with relevant epistatic effects. However, investigating this empirically requires data with high mapping resolution and phenotypes for traits with known non-additive gene action. The objective of the present study was to compare the performance of linear (GBLUP, BayesB and elastic net [ENET]) methods to a non-parametric tree-based ensemble (gradient boosting machine GBM) method for genomic prediction of complex traits in mice. The dataset used contained phenotypic and genotypic information for 835 animals from 6 non-overlapping generations. Traits analyzed were bone mineral density (BMD), body weight at 10, 15 and 20 weeks (BW10, BW15 and BW20), fat percentage (FAT%), circulating cholesterol (CHOL), glucose (GLUC), insulin (INS) and triglycerides (TGL), and urine creatinine (UCRT). After quality control, the genotype dataset contained 50,112 SNP markers. Animals from older generations were considered as a reference subset, while animals in the latest generation as candidates for the validation subset. We also evaluated the impact of different levels of connectedness between reference and validation sets. Model performance was measured as the Pearsons correlation coefficient and mean squared error (MSE) between adjusted phenotypes and the models prediction for animals in the validation subset. Outcomes were also compared across models by checking the overlapping top markers and animals. Linear models outperformed GBM for seven out of ten traits. For these models, accuracy was proportional to the traits heritability. For traits BMD, CHOL and GLU, the GBM model showed better prediction accuracy and lower MSE. Interestingly, for these three traits there is evidence in literature of a relevant portion of phenotypic variance being explained by epistatic effects. We noticed that for lower connectedness, i.e., imposing a gap of one to two generations between reference and validation populations, the superior performance of GBM was only maintained for GLU. Using a subset of top markers selected from a GBM model helped for some of the traits to improve accuracy of prediction when these were fitted into linear and GBM models. The GBM model showed consistently fewer markers and animals in common among the top ranked than linear models. Our results indicate that GBM is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Nevertheless, our results indicate that GBM is a competitive method to predict complex traits in an outbred mice population, especially for traits with assumed epistatic effects.


Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 116
Author(s):  
Xiangfa Zhao ◽  
Guobing Sun

Automatic sleep staging with only one channel is a challenging problem in sleep-related research. In this paper, a simple and efficient method named PPG-based multi-class automatic sleep staging (PMSS) is proposed using only a photoplethysmography (PPG) signal. Single-channel PPG data were obtained from four categories of subjects in the CAP sleep database. After the preprocessing of PPG data, feature extraction was performed from the time domain, frequency domain, and nonlinear domain, and a total of 21 features were extracted. Finally, the Light Gradient Boosting Machine (LightGBM) classifier was used for multi-class sleep staging. The accuracy of the multi-class automatic sleep staging was over 70%, and the Cohen’s kappa statistic k was over 0.6. This also showed that the PMSS method can also be applied to stage the sleep state for patients with sleep disorders.


2013 ◽  
Vol 44 (6) ◽  
pp. 1114-1128 ◽  
Author(s):  
M. J. Gunnarsdottir ◽  
S. M. Gardarsson ◽  
H. O. Andradottir

This paper explores the fate and transport of microbial contamination in a cold climate and coarse aquifers. A confirmed norovirus outbreak in a small rural water supply in the late summer of 2004, which is estimated to have infected over 100 people, is used as a case study. A septic system, 80 m upstream of the water intake, is considered to have contaminated drinking water. Water samples tested were negative for coliform and strongly positive for norovirus. Modelling predicts that a 4.8-log10 removal was possible in the 8 m thick vadose zone, while only a 0.7-log10 and 2.7-log10 removal in the aquifer for viruses and Escherichia coli, respectively. The model results support that the 80 m setback distance was inadequate and roughly 900 m aquifer transport distance was needed to achieve 9-log10 viral removal. Sensitivity analysis showed that the most influential parameters on model transport removal rate are grain size diameter and groundwater velocity, temperature and acidity. The results demonstrate a need for systematic evaluation of septic systems in rural areas in lesser studied coarse strata at low temperatures, thereby strengthening data used for regulatory requirements for more confident determination on safe setback distances.


Sign in / Sign up

Export Citation Format

Share Document