Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique

Mapping Intimacies ◽

10.20944/preprints202004.0342.v1 ◽

2020 ◽

Author(s):

Mohammad Hafez Ahmed

Keyword(s):

Machine Learning ◽

Water Quality ◽

Random Forest ◽

Dissolved Oxygen ◽

Water Temperature ◽

Quality Data ◽

Linear Regression Method ◽

Machine Learning Technique ◽

Learning Technique ◽

Input Variables

Dissolved oxygen (DO) is a key indicator in the study of the ecological health of rivers. Modeling DO is a major challenge due to complex interactions among various process components of it. Considering the vital importance of it in water bodies, the accurate prediction of DO is a critical issue in ecosystem management. Given the intricacy of the current process-based water quality models, a data-driven model could be an effective alternative tool. In this study, a random forest machine learning technique is employed to predict the DO level by identifying its major drivers. Time-series of half-hourly water quality data, spanning from 2007 to 2019, for the South Branch Potomac River near Springfield, WV, are obtained from the United States Geological Survey database. Key drivers are identified, and models are formulated for different scenarios of input variables. The model is calibrated for each input scenario using 80% of the data. Water temperature and pH are found to be the most influential predictors of DO. However, satisfactory model performance is achieved by considering water temperature, pH, and specific conductance as input variables. The model validation is made by predicting DO concentrations for the remaining 20% of the data. The comparison with the traditional multiple linear regression method shows that the random forest model performs significantly better. The study insights are, therefore, expected to be useful to estimate stream/river DO levels at various sites with a minimum number of predictors and help build a sturdy framework for ecosystem health management across an environmental gradient.

Download Full-text

Machine Learning Technique to Prognosis Diabetes Disease: Random Forest Classifier Approach

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_19 ◽

2021 ◽

pp. 219-244

Author(s):

Prajyot Palimkar ◽

Rabindra Nath Shaw ◽

Ankush Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Hydrology and Water Quality Survey Near the Gezhouba Dam in the Winter

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.726-731.3256 ◽

2013 ◽

Vol 726-731 ◽

pp. 3256-3261

Author(s):

Jia Fei Zhou ◽

Cong Feng Wang ◽

De Fu Liu ◽

Jing Wen Xiang ◽

Ping Zhao ◽

...

Keyword(s):

Water Quality ◽

Dissolved Oxygen ◽

Water Temperature ◽

Flow Velocity ◽

Water Column ◽

Quality Data ◽

Chinese Sturgeon ◽

Water Quality Data ◽

Gezhouba Dam ◽

Survival Condition

Filed hydrology and water quality data were collected near the Gezhouba Dam early December of 2012 to analyze the response of Chinese Sturgeon survival condition to water temperature, dissolved oxygen (DO), pH, transparency (SD) and bottom flow-velocity. The results showed that water temperature lag is unconspicuous. The water temperature of Gezhouba Dam Sanjiang (GDS) was lower than that of Gezhouba Dam River (GDR), and it hindered propagation of sturgeon eggs. DO decreased fast in the vertical water column of GDS, pH ranged from 7.5 to 7.71. The hydrology and water quality were suitable for the life condition of sturgeon eggs and fry, except index of bottom flow-velocity.

Download Full-text

Assessing the soil quality of Bansloi river basin, eastern India using soil-quality indices (SQIs) and Random Forest machine learning technique

Ecological Indicators ◽

10.1016/j.ecolind.2020.106804 ◽

2020 ◽

Vol 118 ◽

pp. 106804

Author(s):

Gopal Chandra Paul ◽

Sunil Saha ◽

Krishna Gopal Ghosh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Soil Quality ◽

River Basin ◽

Eastern India ◽

Quality Indices ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Mode Choice Prediction using Machine Learning Technique for A Door-to-Door Journey in Kuantan City

Mekatronika ◽

10.15282/mekatronika.v2i1.6745 ◽

2020 ◽

Vol 2 (1) ◽

pp. 73-78

Author(s):

Nur Fahriza Mohd Ali ◽

Ahmad Farhan Mohd Sadullah ◽

Anwar P.P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Rabiu Muazu Musa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mode Choice ◽

Learning Models ◽

Machine Learning Technique ◽

Travel Mode Choice ◽

Testing Data ◽

Learning Technique ◽

The City ◽

Machine Learning Models

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.

Download Full-text

River Dissolved Oxygen Prediction Based on Random Forest and LSTM

Applied Engineering in Agriculture ◽

10.13031/aea.14496 ◽

2021 ◽

Vol 37 (5) ◽

pp. 901-910

Author(s):

Juan Huan ◽

Bo Chen ◽

Xian Gen Xu ◽

Hui Li ◽

Ming Bao Li ◽

...

Keyword(s):

Water Quality ◽

Random Forest ◽

Quality Management ◽

Dissolved Oxygen ◽

Prediction Model ◽

Water Quality Management ◽

Quality Data ◽

Mean Square ◽

Water Quality Data ◽

Better Than

HighlightsRandom Forest (RF) and LSTM were developed for river DO prediction.PH is the most important feature affecting DO prediction.The model base on RF is better than the model not on RF, and the dimensionality of the input data is reduced by RF.RF-LSTM model is outperformed SVR, RF-SVR, BP, RF-BP, LSTM, RNN models in DO prediction.Abstract. In order to improve the prediction accuracy of dissolved oxygen in rivers, a dissolved oxygen prediction model based on Random Forest (RF) and Long Short Term Memory networks (LSTM) is proposed. First, the Random Forest performs feature selection, which reduces the input dimension of the data and eliminates the influence of irrelevant variables on the prediction of dissolved oxygen. Then build the LSTM river dissolved oxygen prediction model to fit the relationship between water quality data and dissolved oxygen, and finally use real water quality data in the river for verification. The experimental results show that the mean square error (MSE), absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R2) of the RF-LSTM model are 0.658, 0.528, 13.502, 0.811, 0.744, respectively, which are better than other models. The RF-LSTM model has good predictive performance and can provide a reference for river water quality management. Keywords: Dissolved oxygen prediction, LSTM, Random forest, Time series, Water quality management.

Download Full-text

Validity of the EQ-5D-5L questionnaire among the general population of Poland

Quality of Life Research ◽

10.1007/s11136-020-02667-3 ◽

2020 ◽

Author(s):

Katarzyna Młyńczak ◽

Dominik Golicki

Keyword(s):

Machine Learning ◽

Random Forest ◽

General Population ◽

Construct Validity ◽

Psychometric Properties ◽

Machine Learning Technique ◽

Validity Assessment ◽

Novel Approach ◽

Learning Technique ◽

Eq Vas

Abstract Purpose We aim to compare the psychometric properties of the EQ-5D-5L questionnaire with the EQ-5D-3L version and EQ VAS, based on a survey conducted in a sample representing the general adult population of Poland. Methods The survey comprised health-related quality of life (HRQoL) questionnaires: EQ-5D-5L, EQ VAS, SF-12 and EQ-5D-3L, together with demographic and socio-economic characteristics items. The EQ-5D index values were estimated based on a directly measured value set for Poland. The following psychometric properties were analysed: feasibility, distribution of responses, redistribution from EQ-5D-3L to EQ-5D-5L, inconsistencies, ceiling effects, informativity power and construct validity. We proposed a novel approach to the construct validity assessment, based on the use of a machine learning technique known as the random forest algorithm. Results From March to June 2014, 3978 subjects (aged 18–87, 53.2% female) were surveyed. The EQ-5D-5L questionnaire had a lower ceiling effect compared to EQ-5D-3L (38.0% vs 46.6%). Redistribution from EQ-5D-3L to EQ-5D-5L was similar for each dimension, and the mean inconsistency did not exceed 5%. The results of known-groups validation confirmed the hypothesis concerning the relationship between the EQ-5D index values and age, sex and occurrence of diabetes. Conclusions The EQ-5D-5L, in comparison with its EQ-5D-3L equivalent, showed similar or better psychometric properties within the general population of a country. We assessed the construct validity of the questionnaire with a novel approach that was based on a machine learning technique known as the random forest algorithm.

Download Full-text

Rainrate Estimation from FY-4A Cloud Top Temperature for Mesoscale Convective Systems by Using Machine Learning Algorithm

Remote Sensing ◽

10.3390/rs13163273 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3273

Author(s):

Ping Lao ◽

Qi Liu ◽

Yuhao Ding ◽

Yu Wang ◽

Yuan Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mesoscale Convective Systems ◽

Support Vector ◽

Positive Contribution ◽

Machine Learning Technique ◽

Convective Systems ◽

Learning Technique ◽

Mesoscale Convective ◽

Cloud Top Temperature

Satellite rainrate estimation is a great challenge, especially in mesoscale convective systems (MCSs), which is mainly due to the absence of a direct physical connection between observable cloud parameters and surface rainrate. The machine learning technique was employed in this study to estimate rainrate in the MCS domain via using cloud top temperature (CTT) derived from a geostationary satellite. Five kinds of machine learning models were investigated, i.e., polynomial regression, support vector machine, decision tree, random forest, and multilayer perceptron, and the precipitation of Climate Prediction Center morphing technique (CMORPH) was used as the reference. A total of 31 CTT related features were designed to be the potential inputs for training an algorithm, and they were all proved to have a positive contribution in modulating the algorithm. Random forest (RF) shows the best performance among the five kinds of models. By combining the classification and regression schemes of the RF model, an RF-based hybrid algorithm was proposed first to discriminate the rainy pixel and then estimate its rainrate. For the MCS samples considered in this study, such an algorithm generates the best estimation, and its accuracy is definitely higher than the operational precipitation product of FY-4A. These results demonstrate the promising feasibility of applying a machine learning technique to solve the satellite precipitation retrieval problem.

Download Full-text

The effects of applying filters on EEG signals for classifying developers’ code comprehension

Journal of Applied Research and Technology ◽

10.22201/icat.24486736e.2021.19.6.1299 ◽

2021 ◽

Vol 19 (6) ◽

pp. 584-602

Author(s):

Lucian Jose Gonçales ◽

Kleinner Farias ◽

Lucas Kupssinskü ◽

Matheus Segalotto

Keyword(s):

Machine Learning ◽

Software Engineering ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Eeg Signals ◽

Machine Learning Technique ◽

Learning Techniques ◽

Learning Technique ◽

F Measure

EEG signals are a relevant indicator for measuring aspects related to human factors in Software Engineering. EEG is used in software engineering to train machine learning techniques for a wide range of applications, including classifying task difficulty, and developers’ level of experience. The EEG signal contains noise such as abnormal readings, electrical interference, and eye movements, which are usually not of interest to the analysis, and therefore contribute to the lack of precision of the machine learning techniques. However, research in software engineering has not evidenced the effectiveness when applying these filters on EEG signals. The objective of this work is to analyze the effectiveness of filters on EEG signals in the software engineering context. As literature did not focus on the classification of developers’ code comprehension, this study focuses on the analysis of the effectiveness of applying EEG filters for training a machine learning technique to classify developers' code comprehension. A Random Forest (RF) machine learning technique was trained with filtered EEG signals to classify the developers' code comprehension. This study also trained another random forest classifier with unfiltered EEG data. Both models were trained using 10-fold cross-validation. This work measures the classifiers' effectiveness using the f-measure metric. This work used the t-test, Wilcoxon, and U Mann Whitney to analyze the difference in the effectiveness measures (f-measure) between the classifier trained with filtered EEG and the classifier trained with unfiltered EEG. The tests pointed out that there is a significant difference after applying EEG filters to classify developers' code comprehension with the random forest classifier. The conclusion is that the use of EEG filters significantly improves the effectivity to classify code comprehension using the random forest technique.

Download Full-text

Multi-task learning framework for predicting water quality using non-linear machine learning technique

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212117 ◽

2021 ◽

pp. 1-13

Author(s):

D. Senthilkumar ◽

D. George Washington ◽

A.K. Reshmy ◽

M. Noornisha

Keyword(s):

Machine Learning ◽

Water Quality ◽

Prediction Model ◽

Quality Prediction ◽

Machine Learning Technique ◽

Water Quality Prediction ◽

Task Learning ◽

Learning Technique ◽

Non Linear ◽

Linear Machine

Predicting the quality of water is a very important issue in an ecosystem and it can be used to control the increase of water contamination. Also, water quality prediction is a prominent complex non-linear multi-target learning problem and extracting a relevant subset of features from a large number of features with multiple targets is a challenging task. Existing water quality prediction model not focused on multi-target learning process simultaneously and not identifying the non-linear relationship between the features and target variables. Therefore, this study proposes a multi-task learning method dealing with multi-target regression using non-linear machine learning technique. Finally, experiments are conducted to build a prediction model based on the proposed methods to evaluate accuracy on water quality dataset. The experimental results indicate that our method increases the overall accuracy of the experimental dataset compared with the existing methods with the reduced number of significant features.

Download Full-text

Genre e-sport gaming tournament classification using machine learning technique based on decision tree, Naïve Bayes, and random forest algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012037 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012037

Author(s):

Arif Rinaldi Dikananda ◽

Irfan Ali ◽

Fathurrohman ◽

Rizki Ade Rinaldi ◽

Iin

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text