COMPARISON OF RANDOM FOREST AND MULTIPLE LINEAR REGRESSION TO MODEL THE MASS BALANCE OF BIOSOLIDS FROM A COMPLEX BIOSOLIDS MANAGEMENT AREA

2021 ◽  
Author(s):  
Thaís Bremm Pluth ◽  
Dominic A. Brose
Water ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 3019
Author(s):  
Alberto Fernández del Castillo ◽  
Marycarmen Verduzco Garibay ◽  
Carolina Senés-Guerrero ◽  
Carlos Yebra-Montes ◽  
José de Anda ◽  
...  

Systems combining anaerobic bioreactors with constructed wetlands (CW) have proven to be adequate and efficient for wastewater treatment. Detailed knowledge of removal dynamics of contaminants can ensure positive results for engineering and design. Mathematical modeling is a useful approach to studying the dynamics of contaminant removal in wastewater. In this study, water quality monitoring was performed in a system composed of a septic tank (ST), an up flow anaerobic filter (UAF), and a horizontal flow constructed wetland (HFCW). Biological oxygen demand (BOD5), chemical oxygen demand (COD), total Kjeldahl nitrogen (TKN), NH3, organic nitrogen (ON), total suspended solids (TSS), NO2−, and NO3− were measured biweekly during a 3-month period. First-order kinetics, multiple linear regression, and mass balance models were applied for data adjustment. First-order models were useful to predict the outlet concentration of pollutants (R2 > 0.87). Relevant multiple linear regression models were found, which could be applied to facilitate the system’s monitoring and provide valuable information to control and improve biological and physical processes necessary for wastewater treatment. Finally, the values of important parameters (μmax, Ks,  and Yx/s) in mass-balance models were determined with the aid of a differential neural network (DNN) and an optimization algorithm. The estimated parameters indicated the high robustness of the treatment system since performance stability was found despite variations in wastewater composition.


2017 ◽  
Vol 41 (6) ◽  
pp. 648-664 ◽  
Author(s):  
Sérgio Henrique Godinho Silva ◽  
Anita Fernanda dos Santos Teixeira ◽  
Michele Duarte de Menezes ◽  
Luiz Roberto Guimarães Guilherme ◽  
Fatima Maria de Souza Moreira ◽  
...  

ABSTRACT Determination of soil properties helps in the correct management of soil fertility. The portable X-ray fluorescence spectrometer (pXRF) has been recently adopted to determine total chemical element contents in soils, allowing soil property inferences. However, these studies are still scarce in Brazil and other countries. The objectives of this work were to predict soil properties using pXRF data, comparing stepwise multiple linear regression (SMLR) and random forest (RF) methods, as well as mapping and validating soil properties. 120 soil samples were collected at three depths and submitted to laboratory analyses. pXRF was used in the samples and total element contents were determined. From pXRF data, SMLR and RF were used to predict soil laboratory results, reflecting soil properties, and the models were validated. The best method was used to spatialize soil properties. Using SMLR, models had high values of R² (≥0.8), however the highest accuracy was obtained in RF modeling. Exchangeable Ca, Al, Mg, potential and effective cation exchange capacity, soil organic matter, pH, and base saturation had adequate adjustment and accurate predictions with RF. Eight out of the 10 soil properties predicted by RF using pXRF data had CaO as the most important variable helping predictions, followed by P2O5, Zn and Cr. Maps generated using RF from pXRF data had high accuracy for six soil properties, reaching R2 up to 0.83. pXRF in association with RF can be used to predict soil properties with high accuracy at low cost and time, besides providing variables aiding digital soil mapping.


2021 ◽  
Vol 931 (1) ◽  
pp. 012013
Author(s):  
Le Thi Nhut Suong ◽  
A V Bondarev ◽  
E V Kozlova

Abstract Geochemical studies of organic matter in source rocks play an important role in predicting the oil and gas accumulation of any territory, especially in oil and gas shale. For deep understanding, pyrolytic analyses are often carried out on samples before and after extraction of hydrocarbon with chloroform. However, extraction is a laborious and time-consuming process and the workload of laboratory equipment and time doubles. In this work, machine learning regression algorithms is applied for forecasting S2ex based on the pyrolytic analytic result of non-extracted samples. This study is carried out using more than 300 samples from 3 different wells in Bazhenov formation, Western Siberia. For developing a prediction model, 5 different machine learning regression algorithms including Multiple Linear Regression, Polynomial Regression, Support vector regression, Decision tree and Random forest have been tested and compared. The performance of these algorithms is examined by R-squared coefficient. The data of the X2 well was used for building a model. Simultaneously, this data is divided into 2 parts – 80% for training and 20% for checking. The model also was used for prediction of wells X1 and X3. Then, these predictive results were compared with the real results, which had been obtained from standard experiments. Despite limited amount of data, the result exceeded all expectations. The result of prediction also showcases that the relationship between before and after extraction parameters are complex and non-linear. The proof is R2 value of Multiple Linear Regression and Polynomial Regression is negative, which means the model is broken. However, Random forest and Decision tree give us a good performance. With the same algorithms, we can apply for prediction all geochemical parameters by depth or utilize them for well-logging data.


CATENA ◽  
2020 ◽  
Vol 194 ◽  
pp. 104715 ◽  
Author(s):  
Mohammad Reza Pahlavan-Rad ◽  
Khodadad Dahmardeh ◽  
Mojtaba Hadizadeh ◽  
Gholamali Keykha ◽  
Nader Mohammadnia ◽  
...  

2019 ◽  
Vol 46 (5) ◽  
pp. 353-363 ◽  
Author(s):  
Chaozhe Jiang ◽  
Ping Huang ◽  
Javad Lessan ◽  
Liping Fu ◽  
Chao Wen

Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.


2019 ◽  
Author(s):  
Sharad Vikram ◽  
Ashley Collier-Oxandale ◽  
Michael Ostertag ◽  
Massimiliano Menarini ◽  
Camron Chermak ◽  
...  

Abstract. Advances in ambient environmental monitoring technologies are enabling concerned communities and citizens to collect data to better understand their local environment and potential exposures. These mobile, low-cost tools make it possible to collect data with increased temporal and spatial resolution providing data on a large scale with unprecedented levels of detail. This type of data has the potential to empower people to make personal decisions about their exposure and support the development of local strategies for reducing pollution and improving health outcomes. However, calibration of these low-cost instruments has been a challenge. Often, a sensor package is calibrated via field calibration. This involves colocating the sensor package with a high-quality reference instrument for an extended period and then applying machine learning or other model fitting technique such as multiple-linear regression to develop a calibration model for converting raw sensor signals to pollutant concentrations. Although this method helps to correct for the effects of ambient conditions (e.g., temperature) and cross-sensitivities with non-target pollutants, there is a growing body of evidence that calibration models can overfit to a given location or set of environmental conditions on account of the incidental correlation between pollutant levels and environmental conditions, including diurnal cycles. As a result, a sensor package trained at a field site may provide less reliable data when moved, or transferred, to a different location. This is a potential concern for applications seeking to perform monitoring away from regulatory monitoring sites, such as personal mobile monitoring or high-resolution monitoring of a neighborhood. We performed experiments confirming that transferability is indeed a problem and show that it can be improved by collecting data from multiple regulatory sites and building a calibration model that leverages data from a more diverse dataset. We deployed three sensor packages to each of three sites with reference monitors (nine packages total) and then rotated the sensor packages through the sites over time. Two sites were in San Diego, CA, with a third outside of Bakersfield, CA, offering varying environmental conditions, general air quality composition, and pollutant concentrations. When compared to prior single-site calibration, the multi-site approach exhibits better model transferability for a range of modeling approaches. Our experiments also reveal that random forest is especially prone to overfitting, and confirms prior results that transfer is a significant source of both bias and standard error. Bias dominated in our experiments, suggesting that transferability might be easily increased by detecting and correcting for bias. Also, given that many monitoring applications involve the deployment of many sensor packages based on the same sensing technology, there is an opportunity to leverage the availability of multiple sensors at multiple sites during calibration. We contribute a new neural network architecture model termed split-NN that splits the model into two-stages, in which the first stage corrects for sensor-to-sensor variation and the second stage uses the combined data of all the sensors to build a model for a single sensor package. The split-NN modeling approach outperforms multiple linear regression, traditional 2- and 4-layer neural network, and random forest models.


Sign in / Sign up

Export Citation Format

Share Document