scholarly journals Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique

Author(s):  
Mohammad Hafez Ahmed

Dissolved oxygen (DO) is a key indicator in the study of the ecological health of rivers. Modeling DO is a major challenge due to complex interactions among various process components of it. Considering the vital importance of it in water bodies, the accurate prediction of DO is a critical issue in ecosystem management. Given the intricacy of the current process-based water quality models, a data-driven model could be an effective alternative tool. In this study, a random forest machine learning technique is employed to predict the DO level by identifying its major drivers. Time-series of half-hourly water quality data, spanning from 2007 to 2019, for the South Branch Potomac River near Springfield, WV, are obtained from the United States Geological Survey database. Key drivers are identified, and models are formulated for different scenarios of input variables. The model is calibrated for each input scenario using 80% of the data. Water temperature and pH are found to be the most influential predictors of DO. However, satisfactory model performance is achieved by considering water temperature, pH, and specific conductance as input variables. The model validation is made by predicting DO concentrations for the remaining 20% of the data. The comparison with the traditional multiple linear regression method shows that the random forest model performs significantly better. The study insights are, therefore, expected to be useful to estimate stream/river DO levels at various sites with a minimum number of predictors and help build a sturdy framework for ecosystem health management across an environmental gradient.

2013 ◽  
Vol 726-731 ◽  
pp. 3256-3261
Author(s):  
Jia Fei Zhou ◽  
Cong Feng Wang ◽  
De Fu Liu ◽  
Jing Wen Xiang ◽  
Ping Zhao ◽  
...  

Filed hydrology and water quality data were collected near the Gezhouba Dam early December of 2012 to analyze the response of Chinese Sturgeon survival condition to water temperature, dissolved oxygen (DO), pH, transparency (SD) and bottom flow-velocity. The results showed that water temperature lag is unconspicuous. The water temperature of Gezhouba Dam Sanjiang (GDS) was lower than that of Gezhouba Dam River (GDR), and it hindered propagation of sturgeon eggs. DO decreased fast in the vertical water column of GDS, pH ranged from 7.5 to 7.71. The hydrology and water quality were suitable for the life condition of sturgeon eggs and fry, except index of bottom flow-velocity.


Mekatronika ◽  
2020 ◽  
Vol 2 (1) ◽  
pp. 73-78
Author(s):  
Nur Fahriza Mohd Ali ◽  
Ahmad Farhan Mohd Sadullah ◽  
Anwar P.P. Abdul Majeed ◽  
Mohd Azraai Mohd Razman ◽  
Rabiu Muazu Musa

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.


2021 ◽  
Vol 37 (5) ◽  
pp. 901-910
Author(s):  
Juan Huan ◽  
Bo Chen ◽  
Xian Gen Xu ◽  
Hui Li ◽  
Ming Bao Li ◽  
...  

HighlightsRandom Forest (RF) and LSTM were developed for river DO prediction.PH is the most important feature affecting DO prediction.The model base on RF is better than the model not on RF, and the dimensionality of the input data is reduced by RF.RF-LSTM model is outperformed SVR, RF-SVR, BP, RF-BP, LSTM, RNN models in DO prediction.Abstract. In order to improve the prediction accuracy of dissolved oxygen in rivers, a dissolved oxygen prediction model based on Random Forest (RF) and Long Short Term Memory networks (LSTM) is proposed. First, the Random Forest performs feature selection, which reduces the input dimension of the data and eliminates the influence of irrelevant variables on the prediction of dissolved oxygen. Then build the LSTM river dissolved oxygen prediction model to fit the relationship between water quality data and dissolved oxygen, and finally use real water quality data in the river for verification. The experimental results show that the mean square error (MSE), absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R2) of the RF-LSTM model are 0.658, 0.528, 13.502, 0.811, 0.744, respectively, which are better than other models. The RF-LSTM model has good predictive performance and can provide a reference for river water quality management. Keywords: Dissolved oxygen prediction, LSTM, Random forest, Time series, Water quality management.


Author(s):  
Katarzyna Młyńczak ◽  
Dominik Golicki

Abstract Purpose We aim to compare the psychometric properties of the EQ-5D-5L questionnaire with the EQ-5D-3L version and EQ VAS, based on a survey conducted in a sample representing the general adult population of Poland. Methods The survey comprised health-related quality of life (HRQoL) questionnaires: EQ-5D-5L, EQ VAS, SF-12 and EQ-5D-3L, together with demographic and socio-economic characteristics items. The EQ-5D index values were estimated based on a directly measured value set for Poland. The following psychometric properties were analysed: feasibility, distribution of responses, redistribution from EQ-5D-3L to EQ-5D-5L, inconsistencies, ceiling effects, informativity power and construct validity. We proposed a novel approach to the construct validity assessment, based on the use of a machine learning technique known as the random forest algorithm. Results From March to June 2014, 3978 subjects (aged 18–87, 53.2% female) were surveyed. The EQ-5D-5L questionnaire had a lower ceiling effect compared to EQ-5D-3L (38.0% vs 46.6%). Redistribution from EQ-5D-3L to EQ-5D-5L was similar for each dimension, and the mean inconsistency did not exceed 5%. The results of known-groups validation confirmed the hypothesis concerning the relationship between the EQ-5D index values and age, sex and occurrence of diabetes. Conclusions The EQ-5D-5L, in comparison with its EQ-5D-3L equivalent, showed similar or better psychometric properties within the general population of a country. We assessed the construct validity of the questionnaire with a novel approach that was based on a machine learning technique known as the random forest algorithm.


2021 ◽  
Vol 13 (16) ◽  
pp. 3273
Author(s):  
Ping Lao ◽  
Qi Liu ◽  
Yuhao Ding ◽  
Yu Wang ◽  
Yuan Li ◽  
...  

Satellite rainrate estimation is a great challenge, especially in mesoscale convective systems (MCSs), which is mainly due to the absence of a direct physical connection between observable cloud parameters and surface rainrate. The machine learning technique was employed in this study to estimate rainrate in the MCS domain via using cloud top temperature (CTT) derived from a geostationary satellite. Five kinds of machine learning models were investigated, i.e., polynomial regression, support vector machine, decision tree, random forest, and multilayer perceptron, and the precipitation of Climate Prediction Center morphing technique (CMORPH) was used as the reference. A total of 31 CTT related features were designed to be the potential inputs for training an algorithm, and they were all proved to have a positive contribution in modulating the algorithm. Random forest (RF) shows the best performance among the five kinds of models. By combining the classification and regression schemes of the RF model, an RF-based hybrid algorithm was proposed first to discriminate the rainy pixel and then estimate its rainrate. For the MCS samples considered in this study, such an algorithm generates the best estimation, and its accuracy is definitely higher than the operational precipitation product of FY-4A. These results demonstrate the promising feasibility of applying a machine learning technique to solve the satellite precipitation retrieval problem.


2021 ◽  
Vol 19 (6) ◽  
pp. 584-602
Author(s):  
Lucian Jose Gonçales ◽  
Kleinner Farias ◽  
Lucas Kupssinskü ◽  
Matheus Segalotto

EEG signals are a relevant indicator for measuring aspects related to human factors in Software Engineering. EEG is used in software engineering to train machine learning techniques for a wide range of applications, including classifying task difficulty, and developers’ level of experience. The EEG signal contains noise such as abnormal readings, electrical interference, and eye movements, which are usually not of interest to the analysis, and therefore contribute to the lack of precision of the machine learning techniques. However, research in software engineering has not evidenced the effectiveness when applying these filters on EEG signals. The objective of this work is to analyze the effectiveness of filters on EEG signals in the software engineering context. As literature did not focus on the classification of developers’ code comprehension, this study focuses on the analysis of the effectiveness of applying EEG filters for training a machine learning technique to classify developers' code comprehension. A Random Forest (RF) machine learning technique was trained with filtered EEG signals to classify the developers' code comprehension. This study also trained another random forest classifier with unfiltered EEG data. Both models were trained using 10-fold cross-validation. This work measures the classifiers' effectiveness using the f-measure metric. This work used the t-test, Wilcoxon, and U Mann Whitney to analyze the difference in the effectiveness measures (f-measure) between the classifier trained with filtered EEG and the classifier trained with unfiltered EEG. The tests pointed out that there is a significant difference after applying EEG filters to classify developers' code comprehension with the random forest classifier. The conclusion is that the use of EEG filters significantly improves the effectivity to classify code comprehension using the random forest technique.


2021 ◽  
pp. 1-13
Author(s):  
D. Senthilkumar ◽  
D. George Washington ◽  
A.K. Reshmy ◽  
M. Noornisha

Predicting the quality of water is a very important issue in an ecosystem and it can be used to control the increase of water contamination. Also, water quality prediction is a prominent complex non-linear multi-target learning problem and extracting a relevant subset of features from a large number of features with multiple targets is a challenging task. Existing water quality prediction model not focused on multi-target learning process simultaneously and not identifying the non-linear relationship between the features and target variables. Therefore, this study proposes a multi-task learning method dealing with multi-target regression using non-linear machine learning technique. Finally, experiments are conducted to build a prediction model based on the proposed methods to evaluate accuracy on water quality dataset. The experimental results indicate that our method increases the overall accuracy of the experimental dataset compared with the existing methods with the reduced number of significant features.


Sign in / Sign up

Export Citation Format

Share Document