Rainfall Prediction using Machine Learning and Deep Learning Algorithms

The project entitled as “Rainfall Prediction using Machine Learning & Deep Learning Algorithms” is a research project which is developed in Python Language and dataset is stored in Microsoft Excel. This prediction uses various machine learning and deep learning algorithms to find which algorithm predicts with most accurately. Rainfall prediction can be achieved by using binary classification under Data Mining. Predicting the rainfall is very important in several aspects of one’s country and can help from preventing serious natural disasters. For this prediction, Artificial Neural Network using Forward and Backward Propagation, Ada Boost, Gradient Boosting and XGBoost algorithms are used in this model for predicting the rainfall. There are totally five modules used in this project. The Data Analysis Module will analyse the datasets and finding the missing values in the dataset. The Data Pre-processing includes Data Cleaning which is the process of filling the missing values in the dataset. The Feature Transformation Module is used to modify the features of the dataset. The Data Mining Module is used to train the dataset to models using any algorithm for learning the pattern. The Model Evaluation Module is used to measure the performance of the model and finalize the overall best accuracy for the prediction. Dataset used in this prediction is for the country Australia. This main aim of the project is to compare the various boosting algorithms with the neural network and find the best algorithm among them. This prediction can be major advantage to the farmers in order to plant the types of crops according to the needy of water. Overall, we analyse the algorithm which is feasible for qualitatively predicting the rainfall.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Classification of potential blood donors using machine learning algorithms approach

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13619 ◽

2020 ◽

Vol 8 (3) ◽

pp. 217-221

Author(s):

Merinda Lestandy ◽

Lailis Syafa'ah ◽

Amrul Faruq

Keyword(s):

Neural Network ◽

Machine Learning ◽

Blood Donor ◽

Blood Donors ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Blood Type ◽

The Neural Network

Blood donation is the process of taking blood from someone used for blood transfusions. Blood type, sex, age, blood pressure, and hemoglobin are blood donor criteria that must be met and processed manually to classify blood donor eligibility. The manual process resulted in an irregular blood supply because blood donor candidates did not meet the criteria. This study implements machine learning algorithms includes kNN, naïve Bayes, and neural network methods to determine the eligibility of blood donors. This study used 600 training data divided into two classes, namely potential and non-potential donors. The test results show that the accuracy of the neural network is 84.3 %, higher than kNN and naïve Bayes, respectively of 75 % and 84.17 %. It indicates that the neural network method outperforms comparing with kNN and naïve Bayes.

Download Full-text

Landslide Susceptibility Prediction Modeling Based on Remote Sensing and a Novel Deep Learning Algorithm of a Cascade-Parallel Recurrent Neural Network

Sensors ◽

10.3390/s20061576 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1576 ◽

Cited By ~ 13

Author(s):

Li Zhu ◽

Lianghao Huang ◽

Linyu Fan ◽

Jinsong Huang ◽

Faming Huang ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Environmental Factors ◽

Recurrent Neural Network ◽

Landslide Susceptibility ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Predictive Rate

Landslide susceptibility prediction (LSP) modeling is an important and challenging problem. Landslide features are generally uncorrelated or nonlinearly correlated, resulting in limited LSP performance when leveraging conventional machine learning models. In this study, a deep-learning-based model using the long short-term memory (LSTM) recurrent neural network and conditional random field (CRF) in cascade-parallel form was proposed for making LSPs based on remote sensing (RS) images and a geographic information system (GIS). The RS images are the main data sources of landslide-related environmental factors, and a GIS is used to analyze, store, and display spatial big data. The cascade-parallel LSTM-CRF consists of frequency ratio values of environmental factors in the input layers, cascade-parallel LSTM for feature extraction in the hidden layers, and cascade-parallel full connection for classification and CRF for landslide/non-landslide state modeling in the output layers. The cascade-parallel form of LSTM can extract features from different layers and merge them into concrete features. The CRF is used to calculate the energy relationship between two grid points, and the extracted features are further smoothed and optimized. As a case study, the cascade-parallel LSTM-CRF was applied to Shicheng County of Jiangxi Province in China. A total of 2709 landslide grid cells were recorded and 2709 non-landslide grid cells were randomly selected from the study area. The results show that, compared with existing main traditional machine learning algorithms, such as multilayer perception, logistic regression, and decision tree, the proposed cascade-parallel LSTM-CRF had a higher landslide prediction rate (positive predictive rate: 72.44%, negative predictive rate: 80%, total predictive rate: 75.67%). In conclusion, the proposed cascade-parallel LSTM-CRF is a novel data-driven deep learning model that overcomes the limitations of traditional machine learning algorithms and achieves promising results for making LSPs.

Download Full-text

Wheat Lodging Detection from UAS Imagery Using Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12111838 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1838 ◽

Cited By ~ 8

Author(s):

Zhao Zhang ◽

Paulo Flores ◽

C. Igathinathane ◽

Dayakar L. Naik ◽

Ravi Kiran ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Standard Deviation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Unmanned Aerial Systems

The current mainstream approach of using manual measurements and visual inspections for crop lodging detection is inefficient, time-consuming, and subjective. An innovative method for wheat lodging detection that can overcome or alleviate these shortcomings would be welcomed. This study proposed a systematic approach for wheat lodging detection in research plots (372 experimental plots), which consisted of using unmanned aerial systems (UAS) for aerial imagery acquisition, manual field evaluation, and machine learning algorithms to detect the occurrence or not of lodging. UAS imagery was collected on three different dates (23 and 30 July 2019, and 8 August 2019) after lodging occurred. Traditional machine learning and deep learning were evaluated and compared in this study in terms of classification accuracy and standard deviation. For traditional machine learning, five types of features (i.e. gray level co-occurrence matrix, local binary pattern, Gabor, intensity, and Hu-moment) were extracted and fed into three traditional machine learning algorithms (i.e., random forest (RF), neural network, and support vector machine) for detecting lodged plots. For the datasets on each imagery collection date, the accuracies of the three algorithms were not significantly different from each other. For any of the three algorithms, accuracies on the first and last date datasets had the lowest and highest values, respectively. Incorporating standard deviation as a measurement of performance robustness, RF was determined as the most satisfactory. Regarding deep learning, three different convolutional neural networks (simple convolutional neural network, VGG-16, and GoogLeNet) were tested. For any of the single date datasets, GoogLeNet consistently had superior performance over the other two methods. Further comparisons between RF and GoogLeNet demonstrated that the detection accuracies of the two methods were not significantly different from each other (p > 0.05); hence, the choice of any of the two would not affect the final detection accuracies. However, considering the fact that the average accuracy of GoogLeNet (93%) was larger than RF (91%), it was recommended to use GoogLeNet for wheat lodging detection. This research demonstrated that UAS RGB imagery, coupled with the GoogLeNet machine learning algorithm, can be a novel, reliable, objective, simple, low-cost, and effective (accuracy > 90%) tool for wheat lodging detection.

Download Full-text

Machine Learning Methods for Fear Classification Based on Physiological Features

Sensors ◽

10.3390/s21134519 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4519

Author(s):

Livia Petrescu ◽

Cătălin Petrescu ◽

Ana Oprea ◽

Oana Mitruț ◽

Gabriela Moise ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Dimensionality Reduction ◽

Data Augmentation ◽

Binary Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Physiological Data ◽

Gradient Boosting ◽

Support Vector

This paper focuses on the binary classification of the emotion of fear, based on the physiological data and subjective responses stored in the DEAP dataset. We performed a mapping between the discrete and dimensional emotional information considering the participants’ ratings and extracted a substantial set of 40 types of features from the physiological data, which represented the input to various machine learning algorithms—Decision Trees, k-Nearest Neighbors, Support Vector Machine and artificial networks—accompanied by dimensionality reduction, feature selection and the tuning of the most relevant hyperparameters, boosting classification accuracy. The methodology we approached included tackling different situations, such as resolving the problem of having an imbalanced dataset through data augmentation, reducing overfitting, computing various metrics in order to obtain the most reliable classification scores and applying the Local Interpretable Model-Agnostic Explanations method for interpretation and for explaining predictions in a human-understandable manner. The results show that fear can be predicted very well (accuracies ranging from 91.7% using Gradient Boosting Trees to 93.5% using dimensionality reduction and Support Vector Machine) by extracting the most relevant features from the physiological data and by searching for the best parameters which maximize the machine learning algorithms’ classification scores.

Download Full-text

Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks

Genes ◽

10.3390/genes11010041 ◽

2019 ◽

Vol 11 (1) ◽

pp. 41 ◽

Cited By ~ 1

Author(s):

Mengli Xiao ◽

Zhong Zhuang ◽

Wei Pan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Test Data ◽

Sequence Data ◽

Gradient Boosting ◽

Genome Wide Association Studies ◽

Learning Methods ◽

Machine Learning Methods ◽

Local Sequence

Enhancer-promoter interactions (EPIs) are crucial for transcriptional regulation. Mapping such interactions proves useful for understanding disease regulations and discovering risk genes in genome-wide association studies. Some previous studies showed that machine learning methods, as computational alternatives to costly experimental approaches, performed well in predicting EPIs from local sequence and/or local epigenomic data. In particular, deep learning methods were demonstrated to outperform traditional machine learning methods, and using DNA sequence data alone could perform either better than or almost as well as only utilizing epigenomic data. However, most, if not all, of these previous studies were based on randomly splitting enhancer-promoter pairs as training, tuning, and test data, which has recently been pointed out to be problematic; due to multiple and duplicating/overlapping enhancers (and promoters) in enhancer-promoter pairs in EPI data, such random splitting does not lead to independent training, tuning, and test data, thus resulting in model over-fitting and over-estimating predictive performance. Here, after correcting this design issue, we extensively studied the performance of various deep learning models with local sequence and epigenomic data around enhancer-promoter pairs. Our results confirmed much lower performance using either sequence or epigenomic data alone, or both, than reported previously. We also demonstrated that local epigenomic features were more informative than local sequence data. Our results were based on an extensive exploration of many convolutional neural network (CNN) and feed-forward neural network (FNN) structures, and of gradient boosting as a representative of traditional machine learning.

Download Full-text

Gradient boosting for the prediction of gas chromatographic retention indices

Сорбционные и хроматографические процессы ◽

10.17308/sorpchrom.2019.19/2223 ◽

2019 ◽

Vol 19 (6) ◽

pp. 630-635

Author(s):

Dmitriy D. Matyushin ◽

Anastasia Yu. Sholokhova ◽

Aleksey K. Buryak

Keyword(s):

Neural Network ◽

Machine Learning ◽

Retention Indices ◽

Gradient Boosting ◽

Chromatographic Retention ◽

Relative Deviation ◽

The Neural Network ◽

Gas Chromatographic Retention Indices ◽

Chromatographic Retention Indices ◽

Hidden Layer

The estimation of gas chromatographic retention indices based on compounds structures is an importantproblem. Predicted retention indices can be used in a mass spectral library search for the identificationof unknowns. Various machine learning methods are used for this task, but methods based on decisiontrees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability ofthis method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors are used for the neural network and gradient boosting. The model based on gradient boosting outperforms the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.The performance of this model is comparable or better than performance of other modern retention prediction models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST 17. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained with different machine learning algorithms and with the same representation of the molecule strongly correlate with each other.

Download Full-text

The prediction of molecule atomization energy using neural network and extreme gradient boosting

Journal of Physics Conference Series ◽

10.1088/1742-6596/2072/1/012005 ◽

2021 ◽

Vol 2072 (1) ◽

pp. 012005

Author(s):

M Sumanto ◽

M A Martoprawiro ◽

A L Ivansyah

Keyword(s):

Neural Network ◽

Machine Learning ◽

Atomization Energy ◽

Gradient Boosting ◽

Intelligence System ◽

The Neural Network ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

The Right ◽

Parameter Values

Abstract Machine Learning is an artificial intelligence system, where the system has the ability to learn automatically from experience without being explicitly programmed. The learning process from Machine Learning starts from observing the data and then looking at the pattern of the data. The main purpose of this process is to make computers learn automatically. In this study, we will use Machine Learning to predict molecular atomization energy. From various methods in Machine Learning, we use two methods namely Neural Network and Extreme Gradient Boosting. Both methods have several parameters that must be adjusted so that the predicted value of the atomization energy of the molecule has the lowest possible error. We are trying to find the right parameter values for both methods. For the neural network method, it is quite difficult to find the right parameter value because it takes a long time to train the model of the neural network to find out whether the model is good or bad, while for the Extreme Gradient Boosting method the time needed to train the model is shorter, so it is quite easy to find the right parameter values for the model. This study also looked at the effects of the modification on the dataset with the output transformation of normalization and standardization then removing molecules containing Br atoms and changing the entry in the Coulomb matrix to 0 if the distance between atoms in the molecule exceeds 2 angstrom.

Download Full-text

Deep Learning Aided Data-Driven Fault Diagnosis of Rotatory Machine: A Comprehensive Review

Energies ◽

10.3390/en14165150 ◽

2021 ◽

Vol 14 (16) ◽

pp. 5150

Author(s):

Shiza Mushtaq ◽

M. M. Manjurul Islam ◽

Muhammad Sohaib

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Fault Diagnosis ◽

Learning Algorithms ◽

Data Driven ◽

Support Vector ◽

Comprehensive Review ◽

Bearing Fault ◽

Bearing Fault Diagnosis

This paper presents a comprehensive review of the developments made in rotating bearing fault diagnosis, a crucial component of a rotatory machine, during the past decade. A data-driven fault diagnosis framework consists of data acquisition, feature extraction/feature learning, and decision making based on shallow/deep learning algorithms. In this review paper, various signal processing techniques, classical machine learning approaches, and deep learning algorithms used for bearing fault diagnosis have been discussed. Moreover, highlights of the available public datasets that have been widely used in bearing fault diagnosis experiments, such as Case Western Reserve University (CWRU), Paderborn University Bearing, PRONOSTIA, and Intelligent Maintenance Systems (IMS), are discussed in this paper. A comparison of machine learning techniques, such as support vector machines, k-nearest neighbors, artificial neural networks, etc., deep learning algorithms such as a deep convolutional network (CNN), auto-encoder-based deep neural network (AE-DNN), deep belief network (DBN), deep recurrent neural network (RNN), and other deep learning methods that have been utilized for the diagnosis of rotary machines bearing fault, is presented.

Download Full-text

Mood Detection from Physical and Neurophysical Data Using Deep Learning Models

Complexity ◽

10.1155/2019/6434578 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15

Author(s):

Zeynep Hilal Kilimci ◽

Aykut Güven ◽

Mitat Uysal ◽

Selim Akyokus

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Behavioral Data ◽

Support Vector ◽

Learning Models ◽

Physical Data ◽

Conventional Machine

Nowadays, smart devices as a part of daily life collect data about their users with the help of sensors placed on them. Sensor data are usually physical data but mobile applications collect more than physical data like device usage habits and personal interests. Collected data are usually classified as personal, but they contain valuable information about their users when it is analyzed and interpreted. One of the main purposes of personal data analysis is to make predictions about users. Collected data can be divided into two major categories: physical and behavioral data. Behavioral data are also named as neurophysical data. Physical and neurophysical parameters are collected as a part of this study. Physical data contains measurements of the users like heartbeats, sleep quality, energy, movement/mobility parameters. Neurophysical data contain keystroke patterns like typing speed and typing errors. Users’ emotional/mood statuses are also investigated by asking daily questions. Six questions are asked to the users daily in order to determine the mood of them. These questions are emotion-attached questions, and depending on the answers, users’ emotional states are graded. Our aim is to show that there is a connection between users’ physical/neurophysical parameters and mood/emotional conditions. To prove our hypothesis, we collect and measure physical and neurophysical parameters of 15 users for 1 year. The novelty of this work to the literature is the usage of both combinations of physical and neurophysical parameters. Another novelty is that the emotion classification task is performed by both conventional machine learning algorithms and deep learning models. For this purpose, Feedforward Neural Network (FFNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) neural network are employed as deep learning methodologies. Multinomial Naïve Bayes (MNB), Support Vector Regression (SVR), Decision Tree (DT), Random Forest (RF), and Decision Integration Strategy (DIS) are evaluated as conventional machine learning algorithms. To the best of our knowledge, this is the very first attempt to analyze the neurophysical conditions of the users by evaluating deep learning models for mood analysis and enriching physical characteristics with neurophysical parameters. Experiment results demonstrate that the utilization of deep learning methodologies and the combination of both physical and neurophysical parameters enhances the classification success of the system to interpret the mood of the users. A wide range of comparative and extensive experiments shows that the proposed model exhibits noteworthy results compared to the state-of-art studies.

Download Full-text