Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Comparison of Machine Learning Algorithms for Discharge Prediction of Multipurpose Dam

Water ◽

10.3390/w13233369 ◽

2021 ◽

Vol 13 (23) ◽

pp. 3369

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Gwanjae Lee ◽

Dongseok Yang ◽

Joo Hyun Bae ◽

...

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Physical Models ◽

Gradient Boosting ◽

Activity Schedules ◽

Discharge Data ◽

Dam Inflow

For effective water management in the downstream area of a dam, it is necessary to estimate the amount of discharge from the dam to quantify the flow downstream of the dam. In this study, a machine learning model was constructed to predict the amount of discharge from Soyang River Dam using precipitation and dam inflow/discharge data from 1980 to 2020. Decision tree, multilayer perceptron, random forest, gradient boosting, RNN-LSTM, and CNN-LSTM were used as algorithms. The RNN-LSTM model achieved a Nash–Sutcliffe efficiency (NSE) of 0.796, root-mean-squared error (RMSE) of 48.996 m3/s, mean absolute error (MAE) of 10.024 m3/s, R of 0.898, and R2 of 0.807, showing the best results in dam discharge prediction. The prediction of dam discharge using machine learning algorithms showed that it is possible to predict the amount of discharge, addressing limitations of physical models, such as the difficulty in applying human activity schedules and the need for various input data.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Abstract 13455: Predicting Emergency Department Disposition at Triage for Suspected Patients With Acute Coronary Syndrome Using Machine Learning Algorithms

Circulation ◽

10.1161/circ.142.suppl_3.13455 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Stephanie O Frisch ◽

Zeineb Bouzid ◽

Jessica Zègre-Hemsey ◽

Clifton W CALLAWAY ◽

Holli A Devon ◽

...

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Critical Care ◽

Random Forest ◽

Characteristic Curve ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Coronary Syndrome ◽

Critical Care Admission

Introduction: Overcrowded emergency departments (ED) and undifferentiated patients make the provision of care and resources challenging. We examined whether machine learning algorithms could identify ED patients’ disposition (hospitalization and critical care admission) using readily available objective triage data among patients with symptoms suggestive of acute coronary syndrome (ACS). Methods: This was a retrospective observational cohort study of adult patients who were triaged at the ED for a suspected coronary event. A total of 162 input variables (k) were extracted from the electronic health record: demographics (k=3), mode of transportation (k=1), past medical/surgical history (k=57), first ED vital signs (k=7), home medications (k=31), symptomology (k=40), and the computer generated automatic interpretation of 12-lead electrocardiogram (k=23). The primary outcomes were hospitalization and critical care admission (i.e., admission to intensive or step-down care unit). We used 10-fold stratified cross validation to evaluate the performance of five machine learning algorithms to predict the study outcomes: logistic regression, naïve Bayes, random forest, gradient boosting and artificial neural network classifiers. We determined the best model by comparing the area under the receiver operating characteristic curve (AUC) of all models. Results: Included were 1201 patients (age 64±14, 39% female; 10% Black) with a total of 956 hospitalizations, and 169 critical care admissions. The best performing machine learning classifier for the outcome of hospitalization was gradient boosting machine with an AUC of 0.85 (95% CI, 0.82–0.89), 89% sensitivity, and F-score of 0.83; random forest classifier performed the best for the outcome of critical care admission with an AUC of 0.73 (95% CI, 0.70–0.77), 76% sensitivity, and F-score of 0.56. Conclusion: Predictive machine learning algorithms demonstrate excellent to good discriminative power to predict hospitalization and critical care admission, respectively. Administrators and clinicians could benefit from machine learning approaches to predict hospitalization and critical care admission, to optimize and allocate scarce ED and hospital resources and provide optimal care.

Download Full-text

RESEARCH OF APPLICATIONS OF MACHINE LEARNING ALGORITHMS IN IMPROVING OPC SOLUTIONS

International Forum “Microelectronics – 2020”. Joung Scientists Scholarship “Microelectronics – 2020”. XIII International conference «Silicon – 2020». XII young scientists scholarship for silicon nanostructures and devices physics, material science, process and analysis ◽

10.29003/m1647.silicon-2020/350-354 ◽

2020 ◽

Author(s):

Pavel Tryasoguzov ◽

Georgiy Teplov ◽

Alexey Kuzovkov

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Network Models ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Neural Network Models ◽

Machine Learning Methods ◽

Applications Of Machine Learning ◽

Topological Drawing

In this paper the effectiveness of machine learning methods for solving OPC problems was consider. The task was to determine the direction of displacement and the amount of displacement of the boundary of the segment of the topological drawing. The generated training database was used to train regression, random forest, gradient boosting, and feedforward convolutional neural network models.

Download Full-text

Decomposition-Based Soil Moisture Estimation Using UAVSAR Fully Polarimetric Images

Agronomy ◽

10.3390/agronomy11010145 ◽

2021 ◽

Vol 11 (1) ◽

pp. 145

Author(s):

Zeinab Akhavan ◽

Mahdi Hasanlou ◽

Mehdi Hosseini ◽

Heather McNairn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Soil Moisture ◽

Random Forest ◽

Performance Enhancement ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Eigenvalue And Eigenvector ◽

Moisture Estimation

Polarimetric decomposition extracts scattering features that are indicative of the physical characteristics of the target. In this study, three polarimetric decomposition methods were tested for soil moisture estimation over agricultural fields using machine learning algorithms. Features extracted from model-based Freeman–Durden, Eigenvalue and Eigenvector based H/A/α, and Van Zyl decompositions were used as inputs in random forest and neural network regression algorithms. These algorithms were applied to retrieve soil moisture over soybean, wheat, and corn fields. A time series of polarimetric Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) data acquired during the Soil Moisture Active Passive Experiment 2012 (SMAPVEX12) field campaign was used for the training and validation of the algorithms. Three feature selection methods were tested to determine the best input features for the machine learning algorithms. The most accurate soil moisture estimates were derived from the random forest regression algorithm for soybeans, with a correlation of determination (R2) of 0.86, root mean square error (RMSE) of 0.041 m3 m−3 and mean absolute error (MAE) of 0.030 m3 m−3. Feature selection also impacted results. Some features like anisotropy, Horizontal transmit and Horizontal receive (HH), and surface roughness parameters (correlation length and RMS-H) had a direct effect on all algorithm performance enhancement as these parameters have a direct impact on the backscattered signal.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

Implementation of the solution to the oil displacement problem using machine learning classifiers and neural networks

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.241858 ◽

2021 ◽

Vol 5 (4 (113)) ◽

pp. 55-63

Author(s):

Beimbet Daribayev ◽

Aksultan Mukhanbet ◽

Yedil Nurakhov ◽

Timur Imankulov

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Learning Algorithms ◽

High Accuracy ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Machine Learning Classifiers ◽

Oil Displacement ◽

Learning Classifiers

The problem of oil displacement was solved using neural networks and machine learning classifiers. The Buckley-Leverett model is selected, which describes the process of oil displacement by water. It consists of the equation of continuity of oil, water phases and Darcy’s law. The challenge is to optimize the oil displacement problem. Optimization will be performed at three levels: vectorization of calculations; implementation of classical algorithms; implementation of the algorithm using neural networks. A feature of the method proposed in the work is the identification of the method with high accuracy and the smallest errors, comparing the results of machine learning classifiers and types of neural networks. The research paper is also one of the first papers in which a comparison was made with machine learning classifiers and neural and recurrent neural networks. The classification was carried out according to three classification algorithms, such as decision tree, support vector machine (SVM) and gradient boosting. As a result of the study, the Gradient Boosting classifier and the neural network showed high accuracy, respectively 99.99 % and 97.4 %. The recurrent neural network trained faster than the others. The SVM classifier has the lowest accuracy score. To achieve this goal, a dataset was created containing over 67,000 data for class 10. These data are important for the problems of oil displacement in porous media. The proposed methodology provides a simple and elegant way to instill oil knowledge into machine learning algorithms. This removes two of the most significant drawbacks of machine learning algorithms: the need for large datasets and the robustness of extrapolation. The presented principles can be generalized in countless ways in the future and should lead to a new class of algorithms for solving both forward and inverse oil problems

Download Full-text

Application of Data Mining Algorithms for Dementia in People with HIV/AIDS

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/4602465 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Luana Ibiapina Cordeiro Calíope Pinheiro ◽

Maria Lúcia Duarte Pereira ◽

Marcial Porto Fernandez ◽

Francisco Mardônio Vieira Filho ◽

Wilson Jorge Correia Pinto de Abreu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Learning Algorithms ◽

Principal Component ◽

Machine Learning Algorithms ◽

Hiv Aids

Dementia interferes with the individual’s motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.

Download Full-text

Comparative analysis of machine learning algorithms in water extraction

Journal of Physics Conference Series ◽

10.1088/1742-6596/2076/1/012045 ◽

2021 ◽

Vol 2076 (1) ◽

pp. 012045

Author(s):

Aimin Li ◽

Meng Fan ◽

Guangduo Qin

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Comparative Analysis ◽

Random Forest ◽

Decision Tree ◽

Water Body ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector

Abstract There are many traditional methods available for water body extraction based on remote sensing images, such as normalised difference water index (NDWI), modified NDWI (MNDWI), and the multi-band spectrum method, but the accuracy of these methods is limited. In recent years, machine learning algorithms have developed rapidly and been applied widely. Using Landsat-8 images, models such as decision tree, logistic regression, a random forest, neural network, support vector method (SVM), and Xgboost were adopted in the present research within machine learning algorithms. Based on this, through cross validation and a grid search method, parameters were determined for each model.Moreover, the merits and demerits of several models in water body extraction were discussed and a comparative analysis was performed with three methods for determining thresholds in the traditional NDWI. The results show that the neural network has excellent performances and is a stable model, followed by the SVM and the logistic regression algorithm. Furthermore, the ensemble algorithms including the random forest and Xgboost were affected by sample distribution and the model of the decision tree returned the poorest performance.

Download Full-text

Comparative analysis of multiple machine learning algorithms for epileptic seizure prediction

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012055 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012055

Author(s):

H O Lekshmy ◽

Dhanyalaxmi Panickar ◽

Sandhya Harikumar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Short Term Memory ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Seizure Prediction ◽

Short Term ◽

K Nearest Neighbors ◽

Term Memory ◽

Long Short Term Memory

Abstract Epilepsy is a common neurological disease that affects more than 2 percent of the population globally. An imbalance in brain electrical activities causes unpredictable seizures, which eventually leads to epilepsy. Neurostimulators have the power to intervene in advance and avoid the occurrence of seizures. Its efficiency can be increased with the help of heuristics like advanced seizure prediction. Early identification of preictal state will help easy activation of neurostimulator on time. This research concentrates on the performance analysis of various machine learning algorithms on recorded EEG data. Through this study, we aim to find the best model, which can be used to create an ensemble model for better learning. This involves modeling and simulation of classical machine learning technique like Logistic regression, Naive Bayes model, K nearest neighbors Random Forest, and deep learning techniques like an Artificial neural network, Convolutional neural networks, Long short term memory, and Autoencoders. In this analysis, Random Forest and Long Short-Term Memory performed well among all models in terms of sensitivity and specificity.

Download Full-text