Estimation of Reference Evapotranspiration Using Spatial and Temporal Machine Learning Approaches

Evapotranspiration (ET) is widely employed to measure amounts of total water loss between land and atmosphere due to its major contribution to water balance on both regional and global scales. Considering challenges to quantifying nonlinear ET processes, machine learning (ML) techniques have been increasingly utilized to estimate ET due to their powerful advantage of capturing complex nonlinear structures and characteristics. However, limited studies have been conducted in subhumid climates to simulate local and spatial ETo using common ML methods. The current study aims to present a methodology that exempts local data in ETo simulation. The present study, therefore, seeks to estimate and compare reference ET (ETo) using four common ML methods with local and spatial approaches based on continuous 17-year daily climate data from six weather stations across the Red River Valley with subhumid climate. The four ML models have included Gene Expression Programming (GEP), Support Vector Machine (SVM), Multiple Linear Regression (LR), and Random Forest (RF) with three input combinations of maximum and minimum air temperature-based (Tmax, Tmin), mass transfer-based (Tmax, Tmin, U: wind speed), and radiation-based (Rs: solar radiation, Tmax, Tmin) measurements. The estimates yielded by the four ML models were compared against each other by considering spatial and local approaches and four statistical indicators; namely, the root means square error (RMSE), the mean absolute error (MAE), correlation coefficient (r2), and scatter index (SI), which were used to assess the ML model’s performance. The comparison between combinations showed the lowest SI and RMSE values for the RF model with the radiation-based combination. Furthermore, the RF model showed the best performance for all combinations among the four defined models either spatially or locally. In general, the LR, GEP, and SVM models were improved when a local approach was used. The results showed the best performance for the radiation-based combination and the RF model with higher accuracy for all stations either locally or spatially, and the spatial SVM and GEP illustrated the lowest performance among the models and approaches.

Download Full-text

Cereal yield forecasting combining satellite drought-based indices, regional climate and weather data using machine learning approaches in Morocco

10.5194/egusphere-egu21-14590 ◽

2021 ◽

Author(s):

El houssaine Bouras ◽

Lionel Jarlan ◽

Salah Er-Raki ◽

Riad Balaghi ◽

Abdelhakim Amazirh ◽

...

Keyword(s):

Machine Learning ◽

Regional Climate ◽

Model Development ◽

Machine Learning Algorithms ◽

Weather Data ◽

Drought Indices ◽

Support Vector ◽

Learning Approaches ◽

Climate Data ◽

Yield Forecasting

Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R&#178; = 0.90 and RMSE about 3.4 Qt.ha-1. &#160;When comparing the model&#8217;s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.

Download Full-text

Forecasting Of Covid-19 Cases Using Machine Learning Approach

Current Respiratory Medicine Reviews ◽

10.2174/1573398x17666210129131009 ◽

2021 ◽

Vol 17 ◽

Author(s):

Sachin Kumar ◽

Karan Veer

Keyword(s):

Machine Learning ◽

Regression Model ◽

Model Performance ◽

Real Data ◽

Absolute Error ◽

Viral Disease ◽

Support Vector ◽

Family Welfare ◽

Accuracy Score ◽

Learning Approaches

Aims: The objective of this research is to predict the covid-19 cases in India based on the machine learning approaches. Background: Covid-19, a respiratory disease caused by one of the coronavirus family members, has led to a pandemic situation worldwide in 2020. This virus was detected firstly in Wuhan city of China in December 2019. This viral disease has taken less than three months to spread across the globe. Objective: In this paper, we proposed a regression model based on the Support vector machine (SVM) to forecast the number of deaths, the number of recovered cases, and total confirmed cases for the next 30 days. Method: For prediction, the data is collected from Github and the ministry of India's health and family welfare from March 14, 2020, to December 3, 2020. The model has been designed in Python 3.6 in Anaconda to forecast the forecasting value of corona trends until September 21, 2020. The proposed methodology is based on the prediction of values using SVM based regression model with polynomial, linear, rbf kernel. The dataset has been divided into train and test datasets with 40% and 60% test size and verified with real data. The model performance parameters are evaluated as a mean square error, mean absolute error, and percentage accuracy. Results and Conclusion: The results show that the polynomial model has obtained 95 % above accuracy score, linear scored above 90%, and rbf scored above 85% in predicting cumulative death, conformed cases, and recovered cases.

Download Full-text

Evaluation of Machine Learning Techniques for Daily Reference Evapotranspiration Estimation

10.20944/preprints201908.0097.v1 ◽

2019 ◽

Author(s):

Ali Rashid Niaghi ◽

Oveis Hassanijalilian ◽

Jalal Shiri

Keyword(s):

Machine Learning ◽

Reference Evapotranspiration ◽

Meteorological Data ◽

Gene Expression Programming ◽

Machine Learning Techniques ◽

Limiting Factors ◽

Support Vector ◽

Daily Maximum ◽

Climate Data ◽

Input Variables

The ASCE-EWRI reference evapotranspiration (ETo) equation is recommended as a standardized method for reference crop ETo estimation. However, various climate data as input variables to the standardized ETo method are considered limiting factors in most cases and restrict the ETo estimation. This paper assessed the potential of different machine learning (ML) models for ETo estimation using limited meteorological data. The ML models used to estimate daily ETo included Gene Expression Programming (GEP), Support Vector Machine (SVM), Multiple Linear Regression (LR), and Random Forest (RF). Three input combinations of daily maximum and minimum temperature (Tmax and Tmin), wind speed (W) with Tmax and Tmin, and solar radiation (Rs) with Tmax and Tmin were considered using meteorological data during 2003–2016 from six weather stations in the Red River Valley. To understand the performance of the applied models with the various combinations, station, and yearly based tests were assessed with local and spatial approaches. Considering the local and spatial approaches analysis, the LR and RF models illustrated the lowest rate of improvement compared to GEP and SVM. The spatial RF and SVM approaches showed the lowest and highest values of the scatter index as 0.333 and 0.457, respectively. As a result, the radiation-based combination and the RF model showed the best performance with higher accuracy for all stations either locally or spatially, and the spatial SVM and GEP illustrated the lowest performance among models and approaches.

Download Full-text

Prediction of Combined Terrestrial Evapotranspiration Index (CTEI) over Large River Basin Based on Machine Learning Approaches

Water ◽

10.3390/w13040547 ◽

2021 ◽

Vol 13 (4) ◽

pp. 547 ◽

Cited By ~ 3

Author(s):

Ahmed Elbeltagi ◽

Nikul Kumari ◽

Jaydeo K. Dharpure ◽

Ali Mokhtar ◽

Karam Alsafadi ◽

...

Keyword(s):

Machine Learning ◽

River Basin ◽

Mean Squared Error ◽

Gaussian Model ◽

Gaussian Process Regression ◽

Absolute Error ◽

Support Vector ◽

Learning Approaches ◽

Ganga Basin ◽

Svm Algorithm

Drought is a fundamental physical feature of the climate pattern worldwide. Over the past few decades, a natural disaster has accelerated its occurrence, which has significantly impacted agricultural systems, economies, environments, water resources, and supplies. Therefore, it is essential to develop new techniques that enable comprehensive determination and observations of droughts over large areas with satisfactory spatial and temporal resolution. This study modeled a new drought index called the Combined Terrestrial Evapotranspiration Index (CTEI), developed in the Ganga river basin. For this, five Machine Learning (ML) techniques, derived from artificial intelligence theories, were applied: the Support Vector Machine (SVM) algorithm, decision trees, Matern 5/2 Gaussian process regression, boosted trees, and bagged trees. These techniques were driven by twelve different models generated from input combinations of satellite data and hydrometeorological parameters. The results indicated that the eighth model performed best and was superior among all the models, with the SVM algorithm resulting in an R2 value of 0.82 and the lowest errors in terms of the Root Mean Squared Error (RMSE) (0.33) and Mean Absolute Error (MAE) (0.20), followed by the Matern 5/2 Gaussian model with an R2 value of 0.75 and RMSE and MAE of 0.39 and 0.21 mm/day, respectively. Moreover, among all the five methods, the SVM and Matern 5/2 Gaussian methods were the best-performing ML algorithms in our study of CTEI predictions for the Ganga basin.

Download Full-text

A Comparative Study for the Prediction of the Compressive Strength of Self-Compacting Concrete Modified with Fly Ash

Materials ◽

10.3390/ma14174934 ◽

2021 ◽

Vol 14 (17) ◽

pp. 4934 ◽

Cited By ~ 1

Author(s):

Furqan Farooq ◽

Slawomir Czarnecki ◽

Pawel Niewiadomski ◽

Fahid Aslam ◽

Hisham Alabduljabbar ◽

...

Keyword(s):

Machine Learning ◽

Fly Ash ◽

Gene Expression Programming ◽

Empirical Relation ◽

Mathematical Expression ◽

Coefficient Of Determination ◽

Support Vector ◽

Fine Aggregate ◽

Learning Approaches ◽

Self Compacting Concrete

Artificial intelligence and machine learning are employed in creating functions for the prediction of self-compacting concrete (SCC) strength based on input variables proportion as cement replacement. SCC incorporating waste material has been used in learning approaches. Artificial neural network (ANN) support vector machine (SVM) and gene expression programming (GEP) consisting of 300 datasets have been utilized in the model to foresee the mechanical property of SCC. Data used in modeling consist of several input parameters such as cement, water–binder ratio, coarse aggregate, fine aggregate, and fly ash (FA) in combination with the superplasticizer. The best predictive models were selected based on the coefficient of determination (R2) results and model validation. Empirical relation with mathematical expression has been proposed using ANN, SVM, and GEP. The efficiency of the models is assessed by permutation features importance, statistical analysis, and comparison between regression models. The results reveal that the proposed machine learning models achieved adamant accuracy and has elucidated performance in the prediction aspect.

Download Full-text

Machine Learning Techniques for Identifying Fetal Risk During Pregnancy

International Journal of Image and Graphics ◽

10.1142/s0219467822500450 ◽

2021 ◽

Author(s):

S. Ravikumar ◽

E. Kannan

Keyword(s):

Machine Learning ◽

Fetal Heart ◽

Quantitative Description ◽

Absolute Error ◽

Outcome Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Fetal Risk ◽

Learning Techniques

Cardiotocography (CTG) is a biophysical method for assessing fetal condition that primarily relies on the recording and automated analysis of fetal heart activity. The quantitative description of the CTG signals is provided by computerized fetal monitoring systems. Even though effective conclusion generation methods for decision process support are still required to find out the fetal risk such as premature embryo, this proposed method and outcome data can confirm the assessment of the fetal state after birth. Low birth weight is quite possibly the main attribute that significantly depicts an unusual fetal result. These expectations are assessed in a constant experimental decision support system, providing valuable information that can be used to obtain additional information about the fetal state using machine learning techniques. The advancements in modern obstetric practice enabled the use of numerous reliable and robust machine learning approaches in classifying fetal heart rate signals. The Naïve Bayes (NB) classifier, support vector machine (SVM), decision trees (DT), and random forest (RF) are used in the proposed method. To assess these outcomes in the proposed method, some of the metrics such as precision, accuracy, F1 score, recall, sensitivity, logarithmic loss and mean absolute error have been taken. The above mentioned metrics will be helpful to predict the fetal risk.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Practical CO2—WAG Field Operational Designs Using Hybrid Numerical-Machine-Learning Approaches

Energies ◽

10.3390/en14041055 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1055

Author(s):

Qian Sun ◽

William Ampomah ◽

Junyu You ◽

Martha Cather ◽

Robert Balch

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

History Matching ◽

Optimization Problems ◽

Learning Technologies ◽

Petroleum Engineering ◽

Support Vector ◽

Learning Approaches ◽

Field Development ◽

Proxy Models

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text