Machine learning techniques to predict daily rainfall amount

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Machine Learning Techniques to Predict Daily Rainfall Amount

10.21203/rs.3.rs-801241/v1 ◽

2021 ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Learning Algorithm ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model

Abstract It is crucial to predict the amount of daily rainfall to improve agricultural productivities to secure food, and water quality supply to keep the citizen healthy. To predict rainfall, various researches are conducted using data mining and machine learning techniques of different countries’ environmental datasets. The Pearson correlation technique is used to select relevant environmental variables which are used as an input for the machine learning model of this study. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The dataset is collected from the local meteorological office to measure the performance of three machine learning techniques as Multivariate Linear Regression, Random Forest and Extreme Gradient Boost. Root mean squared error and Mean absolute Error are used to measure the performance of the machine learning model for this study. The result of the study shows that the Extreme Gradient Boost gradient descent machine learning algorithm performs better than others.

Download Full-text

Application of Machine Learning to Interpret Steady State Drainage Relative Permeability Experiments

10.2118/207877-ms ◽

2021 ◽

Author(s):

Eric Sonny Mathew ◽

Moussa Tembely ◽

Waleed AlAmeri ◽

Emad W. Al-Shalabi ◽

Abdul Ravoof Shaik

Keyword(s):

Neural Network ◽

Machine Learning ◽

Experimental Data ◽

Steady State ◽

Relative Permeability ◽

Learning Model ◽

Gradient Boosting ◽

Data Set ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract A meticulous interpretation of steady-state or unsteady-state relative permeability (Kr) experimental data is required to determine a complete set of Kr curves. In this work, three different machine learning models was developed to assist in a faster estimation of these curves from steady-state drainage coreflooding experimental runs. The three different models that were tested and compared were extreme gradient boosting (XGB), deep neural network (DNN) and recurrent neural network (RNN) algorithms. Based on existing mathematical models, a leading edge framework was developed where a large database of Kr and Pc curves were generated. This database was used to perform thousands of coreflood simulation runs representing oil-water drainage steady-state experiments. The results obtained from these simulation runs, mainly pressure drop along with other conventional core analysis data, were utilized to estimate Kr curves based on Darcy's law. These analytically estimated Kr curves along with the previously generated Pc curves were fed as features into the machine learning model. The entire data set was split into 80% for training and 20% for testing. K-fold cross validation technique was applied to increase the model accuracy by splitting the 80% of the training data into 10 folds. In this manner, for each of the 10 experiments, 9 folds were used for training and the remaining one was used for model validation. Once the model is trained and validated, it was subjected to blind testing on the remaining 20% of the data set. The machine learning model learns to capture fluid flow behavior inside the core from the training dataset. The trained/tested model was thereby employed to estimate Kr curves based on available experimental results. The performance of the developed model was assessed using the values of the coefficient of determination (R2) along with the loss calculated during training/validation of the model. The respective cross plots along with comparisons of ground-truth versus AI predicted curves indicate that the model is capable of making accurate predictions with error percentage between 0.2 and 0.6% on history matching experimental data for all the three tested ML techniques (XGB, DNN, and RNN). This implies that the AI-based model exhibits better efficiency and reliability in determining Kr curves when compared to conventional methods. The results also include a comparison between classical machine learning approaches, shallow and deep neural networks in terms of accuracy in predicting the final Kr curves. The various models discussed in this research work currently focusses on the prediction of Kr curves for drainage steady-state experiments; however, the work can be extended to capture the imbibition cycle as well.

Download Full-text

Prediction of Mean Wave Overtopping Discharge Using Gradient Boosting Decision Trees

Water ◽

10.3390/w12061703 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1703 ◽

Cited By ~ 3

Author(s):

Joost P. den Bieman ◽

Josefine M. Wilms ◽

Henk F. P. van den Boogaard ◽

Marcel R. A. van Gent

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Numerical Models ◽

Input Parameter ◽

Design Criterion ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Wave Overtopping ◽

Learning Techniques ◽

Machine Learning Model

Wave overtopping is an important design criterion for coastal structures such as dikes, breakwaters and promenades. Hence, the prediction of the expected wave overtopping discharge is an important research topic. Existing prediction tools consist of empirical overtopping formulae, machine learning techniques like neural networks, and numerical models. In this paper, an innovative machine learning method—gradient boosting decision trees—is applied to the prediction of mean wave overtopping discharges. This new machine learning model is trained using the CLASH wave overtopping database. Optimizations to its performance are realized by using feature engineering and hyperparameter tuning. The model is shown to outperform an existing neural network model by reducing the error on the prediction of the CLASH database by a factor of 2.8. The model predictions follow physically realistic trends for variations of important features, and behave regularly in regions of the input parameter space with little or no data coverage.

Download Full-text

Backpropagation Neural Network-Based Machine Learning Model for Prediction of Soil Friction Angle

Mathematical Problems in Engineering ◽

10.1155/2020/8845768 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Thuy-Anh Nguyen ◽

Hai-Bang Ly ◽

Binh Thai Pham

Keyword(s):

Neural Network ◽

Machine Learning ◽

Pearson Correlation ◽

Field Tests ◽

Friction Angle ◽

Absolute Error ◽

Learning Model ◽

Machine Learning Techniques ◽

Backpropagation Neural Network ◽

Machine Learning Model

In the design process of foundations, pavements, retaining walls, and other geotechnical matters, estimation of soil strength-related parameters is crucial. In particular, the friction angle is a critical shear strength factor in assessing the stability and deformation of geotechnical structures. Practically, laboratory or field tests have been conducted to determine the friction angle of soil. However, these jobs are often time-consuming and quite expensive. Therefore, the prediction of geo-mechanical properties of soils using machine learning techniques has been widely applied in recent times. In this study, the Bayesian regularization backpropagation algorithm is built to predict the internal friction angle of the soil based on 145 data collected from experiments. The performance of the model is evaluated by three specific statistical criteria, such as the Pearson correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE). The results show that the proposed algorithm performed well for the prediction of the friction angle of soil (R = 0.8885, RMSE = 0.0442, and MAE = 0.0328). Therefore, it can be concluded that the backpropagation neural network-based machine learning model is a reasonably accurate and useful prediction tool for engineers in the predesign phase.

Download Full-text

XGB-RF: A Hybrid Machine Learning Approach for IoT Intrusion Detection

Telecom ◽

10.3390/telecom3010003 ◽

2022 ◽

Vol 3 (1) ◽

pp. 52-69

Author(s):

Jabed Al Faysal ◽

Sk Tahmid Mostafa ◽

Jannatul Sultana Tamanna ◽

Khondoker Mirazul Mumenin ◽

Md. Mashrur Arifin ◽

...

Keyword(s):

Machine Learning ◽

Security And Privacy ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Network Systems ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Hybrid Machine ◽

Security Operations ◽

Iot Devices

In the past few years, Internet of Things (IoT) devices have evolved faster and the use of these devices is exceedingly increasing to make our daily activities easier than ever. However, numerous security flaws persist on IoT devices due to the fact that the majority of them lack the memory and computing resources necessary for adequate security operations. As a result, IoT devices are affected by a variety of attacks. A single attack on network systems or devices can lead to significant damages in data security and privacy. However, machine-learning techniques can be applied to detect IoT attacks. In this paper, a hybrid machine learning scheme called XGB-RF is proposed for detecting intrusion attacks. The proposed hybrid method was applied to the N-BaIoT dataset containing hazardous botnet attacks. Random forest (RF) was used for the feature selection and eXtreme Gradient Boosting (XGB) classifier was used to detect different types of attacks on IoT environments. The performance of the proposed XGB-RF scheme is evaluated based on several evaluation metrics and demonstrates that the model successfully detects 99.94% of the attacks. After comparing it with state-of-the-art algorithms, our proposed model has achieved better performance for every metric. As the proposed scheme is capable of detecting botnet attacks effectively, it can significantly contribute to reducing the security concerns associated with IoT systems.

Download Full-text

KinasepKipred: A Predictive Model for Estimating Ligand-Kinase Inhibitor Constant (pKi)

10.1101/798561 ◽

2019 ◽

Author(s):

KC Govinda ◽

Md Mahmudulla Hassan ◽

Suman Sirimulla

Keyword(s):

Machine Learning ◽

Computational Models ◽

Kinase Inhibitor ◽

Pearson Correlation ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Models ◽

Link Type ◽

Extreme Gradient Boosting ◽

Machine Learning Models

AbstractKinases are one of the most important classes of drug targets for therapeutic use. Algorithms that can accurately predict the drug-kinase inhibitor constant (pKi) of kinases can considerably accelerate the drug discovery process. In this study, we have developed computational models, leveraging machine learning techniques, to predict ligand-kinase (pKi) values. Kinase-ligand inhibitor constant (Ki) data was retrieved from Drug Target Commons (DTC) and Metz databases. Machine learning models were developed based on structural and physicochemical features of the protein and, topological pharmacophore atomic triplets fingerprints of the ligands. Three machine learning models [random forest (RFR), extreme gradient boosting (XGBoost) and artificial neural network (ANN)] were tested for model development. The performance of our models were evaluated using several metrics with 95% confidence interval. RFR model was finally selected based on the evaluation metrics on test datasets and used for web implementation. The best and selected model achieved a Pearson correlation coefficient (R) of 0.887 (0.881, 0.893), root-mean-square error (RMSE) of 0.475 (0.465, 0.486), Concordance index (Con. Index) of 0.854 (0.851, 0.858), and an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.957 (0.954, 0.960) during the internal 5-fold cross validation.AvailabilityGitHub: https://github.com/sirimullalab/KinasepKipred, Docker: sirimullalab/kinasepkipredImplementationhttps://drugdiscovery.utep.edu/pki/Graphical TOC Entry

Download Full-text

Intercomparing the robustness of machine learning models in simulation and forecasting of streamflow

Journal of Water and Climate Change ◽

10.2166/wcc.2020.365 ◽

2020 ◽

Author(s):

Parthiban Loganathan ◽

Amit Baburao Mahindrakar

Keyword(s):

Machine Learning ◽

Large Scale ◽

Resource Planning ◽

Machine Learning Techniques ◽

Low Flow ◽

Gradient Boosting ◽

Daily Streamflow ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Hydrological Indices

Abstract The intercomparison of streamflow simulation and the prediction of discharge using various renowned machine learning techniques were performed. The daily streamflow discharge model was developed for 35 observation stations located in a large-scale river basin named Cauvery. Various hydrological indices were calculated for observed and predicted discharges for comparing and evaluating the replicability of local hydrological conditions. The model variance and bias observed from the proposed extreme gradient boosting decision tree model were less than 15%, which is compared with other machine learning techniques considered in this study. The model Nash–Sutcliffe efficiency and coefficient of determination values are above 0.7 for both the training and testing phases which demonstrate the effectiveness of model performance. The comparison of monthly observed and model-predicted discharges during the validation period illustrates the model's ability in representing the peaks and fall in high-, medium-, and low-flow zones. The assessment and comparison of hydrological indices between observed and predicted discharges illustrate the model's ability in representing the baseflow, high-spell, and low-spell statistics. Simulating streamflow and predicting discharge are essential for water resource planning and management, especially in large-scale river basins. The proposed machine learning technique demonstrates significant improvement in model efficiency by dropping variance and bias which, in turn, improves the replicability of local-scale hydrology.

Download Full-text

Bankruptcy Prediction Using Machine Learning Techniques

Journal of Risk and Financial Management ◽

10.3390/jrfm15010035 ◽

2022 ◽

Vol 15 (1) ◽

pp. 35

Author(s):

Shekar Shetty ◽

Mohamed Musa ◽

Xavier Brédart

Keyword(s):

Machine Learning ◽

Small And Medium Enterprises ◽

Bankruptcy Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Global Accuracy ◽

Medium Enterprises

In this study, we apply several advanced machine learning techniques including extreme gradient boosting (XGBoost), support vector machine (SVM), and a deep neural network to predict bankruptcy using easily obtainable financial data of 3728 Belgian Small and Medium Enterprises (SME’s) during the period 2002–2012. Using the above-mentioned machine learning techniques, we predict bankruptcies with a global accuracy of 82–83% using only three easily obtainable financial ratios: the return on assets, the current ratio, and the solvency ratio. While the prediction accuracy is similar to several previous models in the literature, our model is very simple to implement and represents an accurate and user-friendly tool to discriminate between bankrupt and non-bankrupt firms.

Download Full-text

Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh

Information ◽

10.3390/info12120514 ◽

2021 ◽

Vol 12 (12) ◽

pp. 514

Author(s):

Fahad Rahman Amik ◽

Akash Lanard ◽

Ahnaf Ismat ◽

Sifat Momen

Keyword(s):

Machine Learning ◽

Web Application ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Good Prediction ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Exploratory Data ◽

Forecasting System ◽

Selection Operator

Pre-owned cars (i.e., cars with one or more previous retail owners) are extremely popular in Bangladesh. Customers who plan to purchase a pre-owned car often struggle to find a car within a budget as well as to predict the price of a particular pre-owned car. Currently, Bangladesh lacks online services that can provide assistance to customers purchasing pre-owned cars. A good prediction of prices of pre-owned cars can help customers greatly in making an informed decision about buying a pre-owned car. In this article, we look into this problem and develop a forecasting system (using machine learning techniques) that helps a potential buyer to estimate the price of a pre-owned car he is interested in. A dataset is collected and pre-processed. Exploratory data analysis has been performed. Following that, various machine learning regression algorithms, including linear regression, LASSO (Least Absolute Shrinkage and Selection Operator) regression, decision tree, random forest, and extreme gradient boosting have been applied. After evaluating the performance of each method, the best-performing model (XGBoost) was chosen. This model is capable of properly predicting prices more than 91% of the time. Finally, the model has been deployed as a web application in a local machine so that this can be later made available to end users.

Download Full-text

Machine Learning Methods to Predict Social Media Disaster Rumor Refuters

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16081452 ◽

2019 ◽

Vol 16 (8) ◽

pp. 1452 ◽

Cited By ~ 4

Author(s):

Shihang Wang ◽

Zongmin Li ◽

Yuhong Wang ◽

Qi Zhang

Keyword(s):

Machine Learning ◽

Language Processing ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Short Text ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Effective Decision Support ◽

Short Text Similarity

This research provides a general methodology for distinguishing disaster-related anti-rumor spreaders from a non-ignorant population base, with strong connections in their social circle. Several important influencing factors are examined and illustrated. User information from the most recent posted microblog content of 3793 Sina Weibo users was collected. Natural language processing (NLP) was used for the sentiment and short text similarity analyses, and four machine learning techniques, i.e., logistic regression (LR), support vector machines (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were compared on different rumor refuting microblogs; after which a valid and robust distinguishing XGBoost model was trained and validated to predict who would retweet disaster-related rumor refuting microblogs. Compared with traditional prediction variables that only access user information, the similarity and sentiment analyses of the most recent user microblog contents were found to significantly improve prediction precision and robustness. The number of user microblogs also proved to be a valuable reference for all samples during the prediction process. This prediction methodology could be possibly more useful for WeChat or Facebook as these have relatively stable closed-loop communication channels, which means that rumors are more likely to be refuted by acquaintances. Therefore, the methodology is going to be further optimized and validated on WeChat-like channels in the future. The novel rumor refuting approach presented in this research harnessed NLP for the user microblog content analysis and then used the analysis results of NLP as additional prediction variables to identify the anti-rumor spreaders. Therefore, compared to previous studies, this study presents a new and effective decision support for rumor countermeasures.

Download Full-text