Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

10.20944/preprints201906.0008.v1 ◽

2019 ◽

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarría ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Linear Regression ◽

Air Temperature ◽

Satellite Data ◽

Multivariate Linear Regression ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machine and Random Forest, are compared with Multivariate Linear Regression, TVX and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using four different statistics on a daily basis allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest with residual kriging produces the best results (R$^2$=0.612 $\pm$ 0.019, NSE=0.578 $\pm$ 0.025, RMSE=1.068 $\pm$ 0.027, PBIAS=-0.172 $\pm$ 0.046), whereas TVX produces the least accurate results. The environmental conditions in the study area are not really suited to TVX, moreover this method only takes into account satellite data. On the other hand, regression methods (Support Vector Machine, Random Forest and Multivariate Linear Regression) use several parameters that are easily calculated from a Digital Elevation Model, adding very little difficulty to the use of satellite data alone. The most important variables in the Random Forest Model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text

Machine Learning Algorithms For Understanding The Determinants of Under-Five Mortality

10.21203/rs.3.rs-1021040/v1 ◽

2021 ◽

Author(s):

Rakesh Kumar Saroj ◽

Pawan Kumar Yadav ◽

Rajneesh Singh ◽

Obvious Nchimunya Chilyabanyama

Keyword(s):

Machine Learning ◽

Random Forest ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Mortality Data ◽

Mortality Factors ◽

Under Five ◽

Learning Techniques

Abstract Background: The death rate of under-five children in India declined last few decades, but few bigger states have poor performance. This is a matter of serious concern for the child's health as well as social development. Nowadays, machine learning techniques play a crucial role in the smart health care system to capture the hidden factors and patterns of outcomes. In this paper, we used machine learning techniques to predict the important factors of under-five mortality.This study aims to explore the importance of machine learning techniques to predict under-five mortality and to find the important factors that cause under-five mortality.The data was taken from the National Family Health Survey-IV of Uttar Pradesh. We used four machine learning techniques like decision tree, support vector machine, random forest, and logistic regression to predict under-five mortality factors and model accuracy of each model. We have also used information gain to rank to know the important variables for accurate predictions in under-five mortality data.Result: Random Forest (RF) predicts the child mortality factors with the highest accuracy of 97.5 %, and the number of living children, births in the last five years, educational level, birth order, total children ever born, currently breastfeeding, and size of child at birth that identifying as essential factors for under-five mortality.Conclusion: The study focuses on machine learning techniques to predict and identify important factors for under-five mortality. The random forest model provides an excellent predictive result for estimating the risk factors of under-five mortality. Based on the resulting outcome, policymakers can make policies and plans to reduce under-five mortality.

Download Full-text

A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques

Journal of Computer Networks and Communications ◽

10.1155/2021/4767388 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ali Soleymani ◽

Fatemeh Arabgol

Keyword(s):

Machine Learning ◽

Random Forest ◽

Text Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Detection Accuracy ◽

Domain Name ◽

Botnet Detection ◽

Learning Techniques

In today’s security landscape, advanced threats are becoming increasingly difficult to detect as the pattern of attacks expands. Classical approaches that rely heavily on static matching, such as blacklisting or regular expression patterns, may be limited in flexibility or uncertainty in detecting malicious data in system data. This is where machine learning techniques can show their value and provide new insights and higher detection rates. The behavior of botnets that use domain-flux techniques to hide command and control channels was investigated in this research. The machine learning algorithm and text mining used to analyze the network DNS protocol and identify botnets were also described. For this purpose, extracted and labeled domain name datasets containing healthy and infected DGA botnet data were used. Data preprocessing techniques based on a text-mining approach were applied to explore domain name strings with n-gram analysis and PCA. Its performance is improved by extracting statistical features by principal component analysis. The performance of the proposed model has been evaluated using different classifiers of machine learning algorithms such as decision tree, support vector machine, random forest, and logistic regression. Experimental results show that the random forest algorithm can be used effectively in botnet detection and has the best botnet detection accuracy.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text

Using Artificial Neural Networks to Improve CFS Week 3-4 Precipitation and 2-Meter Air Temperature Forecasts

Weather and Forecasting ◽

10.1175/waf-d-20-0014.1 ◽

2021 ◽

Author(s):

Yun Fan ◽

Vladimir Krasnopolsky ◽

Huug van den Dool ◽

Chung-Yu Wu ◽

Jon Gottschalck

Keyword(s):

Neural Network ◽

Machine Learning ◽

Linear Regression ◽

Multiple Linear Regression ◽

Forecast Accuracy ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Nonlinear Features ◽

Model Output Statistics ◽

High Level

AbstractForecast skill from dynamical forecast models decreases quickly with projection time due to various errors. Therefore, post-processing methods, from simple bias correction methods to more complicated multiple linear regression-based Model Output Statistics, are used to improve raw model forecasts. Usually, these methods show clear forecast improvement over the raw model forecasts, especially for short-range weather forecasts. However, linear approaches have limitations because the relationship between predictands and predictors may be nonlinear. This is even truer for extended range forecasts, such as Week 3-4 forecasts.In this study, neural network techniques are used to seek or model the relationships between a set of predictors and predictands, and eventually to improve Week 3-4 precipitation and 2-meter temperature forecasts made by the NOAA NCEP Climate Forecast System. Benefitting from advances in machine learning techniques in recent years, more flexible and capable machine learning algorithms and availability of big datasets enable us not only to explore nonlinear features or relationships within a given large dataset, but also to extract more sophisticated pattern relationships and co-variabilities hidden within the multi-dimensional predictors and predictands. Then these more sophisticated relationships and high-level statistical information are used to correct the model Week 3-4 precipitation and 2-meter temperature forecasts. The results show that to some extent neural network techniques can significantly improve the Week 3-4 forecast accuracy and greatly increase the efficiency over the traditional multiple linear regression methods.

Download Full-text

Using Machine Learning to Predict Heart Disease

WSEAS TRANSACTIONS ON BIOLOGY AND BIOMEDICINE ◽

10.37394/23208.2022.19.1 ◽

2022 ◽

Vol 19 ◽

pp. 1-9

Author(s):

Nikhil Bora ◽

Sreedevi Gutta ◽

Ahmad Hadaegh

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Heart Disease ◽

Random Forest ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor

Heart Disease has become one of the most leading cause of the death on the planet and it has become most life-threatening disease. The early prediction of the heart disease will help in reducing death rate. Predicting Heart Disease has become one of the most difficult challenges in the medical sector in recent years. As per recent statistics, about one person dies from heart disease every minute. In the realm of healthcare, a massive amount of data was discovered for which the data-science is critical for analyzing this massive amount of data. This paper proposes heart disease prediction using different machine-learning algorithms like logistic regression, naïve bayes, support vector machine, k nearest neighbor (KNN), random forest, extreme gradient boost, etc. These machine learning algorithm techniques we used to predict likelihood of person getting heart disease on the basis of features (such as cholesterol, blood pressure, age, sex, etc. which were extracted from the datasets. In our research we used two separate datasets. The first heart disease dataset we used was collected from very famous UCI machine learning repository which has 303 record instances with 14 different attributes (13 features and one target) and the second dataset that we used was collected from Kaggle website which contained 1190 patient’s record instances with 11 features and one target. This dataset is a combination of 5 popular datasets for heart disease. This study compares the accuracy of various machine learning techniques. In our research, for the first dataset we got the highest accuracy of 92% by Support Vector Machine (SVM). And for the second dataset, Random Forest gave us the highest accuracy of 94.12%. Then, we combined both the datasets which we used in our research for which we got the highest accuracy of 93.31% using Random Forest.

Download Full-text

Rice Crop Yield Prediction Using Multi-Level Machine Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9062 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4280-4286

Author(s):

G. L. Anoop ◽

C. Nandini

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Decision Tree ◽

Multiple Linear Regression ◽

Crop Yield ◽

Machine Learning Algorithms ◽

Rice Crop ◽

Machine Learning Techniques ◽

Indian Government ◽

Regression Methods

Agriculture and allied production contributes to Indian economy and food security of India. Crop yield predictive model will help farmers and agriculture department and organization to take better decisions. In this paper we are proposingmulti-level, machine learning algorithms to predict rice crop yield. Here, data were collected from Indian Government website for 4 districts of Karnataka, i.e., Mysore, Mandya Raichur and Koppal, these data were publically available. In our proposed method initially, we have performed data pre-processing using z-score, normalization and Standardizing residuals on collected data, then multilevel decision tree and multilevel multiple linear regression methods are presented to predict the rice crop yield and evaluated the performance of both. The experimental results shows that the multiple linear regression is accurate than the decision tree technique. This prediction will guide the farmer to make better decision to gain better yield and for their livelihood in particular temperature or climatic scenario.

Download Full-text

Machine Learning For Prognosis of Life Expectancy and Diseases

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9156.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1765-1771

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Life Expectancy ◽

Multiple Linear Regression ◽

Machine Learning Algorithms ◽

Economic Factors ◽

Average Life Expectancy ◽

Average Percentage ◽

Hiv Aids

Longevity depends on various facets such as economic growth of the country, along with the health innovations of the region. Along with the prophecy of existence, we also figure out how sensitive a particular mainland is to few chronic diseases. These factors have a robust impact on the potential life span of the population. We study the biological and economical aspects of continents and their countries to predict the life expectancy of the population and to perceive the probability of the continent possessing long standing diseases like measles, HIV/AIDS, etc. Our research is conducted on the theory that exhibits the dependency or correlation of life expectancy with the various factors which includes the health factors as well as the economic factors. Two Machine learning algorithms simple linear regression, multiple linear regression are used for predicting the expectancy of life over different continents, whereas, decision tree algorithm, random forest algorithm, and were applied to classify the likelihood of occurrence of the disease. On comparing and contrasting various algorithms, we can infer that, multiple linear regression produces the most accurate results as to what the average life expectancy of the population would be given the current features of the continent like the adult mortality rate, alcohol consumption rate, infant deaths, the GDP of the country, average percentage expenditure of the population on health care and treatments, schooling rate, and other such features. On the other hand, we study five diseases namely, HIV/AIDS, measles, diphtheria, hepatitis B and polio. The experiment concluded that, on majority, random forest produces better results of classification based on the economic factors of the combination of various countries of different continents

Download Full-text

Computation of High-Performance Concrete Compressive Strength Using Standalone and Ensembled Machine Learning Techniques

Materials ◽

10.3390/ma14227034 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7034

Author(s):

Yue Xu ◽

Waqas Ahmad ◽

Ayaz Ahmad ◽

Krzysztof Adam Ostrowski ◽

Marta Dudek ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Support Vector Regression ◽

High Performance ◽

Cross Validation ◽

High Performance Concrete ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Fold Cross Validation

The current trend in modern research revolves around novel techniques that can predict the characteristics of materials without consuming time, effort, and experimental costs. The adaptation of machine learning techniques to compute the various properties of materials is gaining more attention. This study aims to use both standalone and ensemble machine learning techniques to forecast the 28-day compressive strength of high-performance concrete. One standalone technique (support vector regression (SVR)) and two ensemble techniques (AdaBoost and random forest) were applied for this purpose. To validate the performance of each technique, coefficient of determination (R2), statistical, and k-fold cross-validation checks were used. Additionally, the contribution of input parameters towards the prediction of results was determined by applying sensitivity analysis. It was proven that all the techniques employed showed improved performance in predicting the outcomes. The random forest model was the most accurate, with an R2 value of 0.93, compared to the support vector regression and AdaBoost models, with R2 values of 0.83 and 0.90, respectively. In addition, statistical and k-fold cross-validation checks validated the random forest model as the best performer based on lower error values. However, the prediction performance of the support vector regression and AdaBoost models was also within an acceptable range. This shows that novel machine learning techniques can be used to predict the mechanical properties of high-performance concrete.

Download Full-text

Performance Evaluation Of Machine Learning Techniques On Rpas Remote Sensing Images

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1197.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 1035-1039

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Random Forest ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Kappa Index ◽

Remote Sensing Images ◽

Accuracy Measure

Recent advancements in remote sensing platforms from satellites to close-range Remotely Piloted Aircraft System (RPAS), is principal to a growing demand for innovative image processing and classification tools. Where, Machine learning approaches are very prevailing group of data driven implication tools that provide a broader scope when applied to remote sensed data. In this paper, applying different machine learning approaches on the remote sensing images with open source packages in R, to find out which algorithm is more efficient for obtaining better accuracy. We carried out a rigorous comparison of four machine learning algorithms-Support vector machine, Random forest, regression tree, Classification and Naive Bayes. These algorithms are evaluated by Classification accurateness, Kappa index and curve area as accuracy metrics. Ten runs are done to obtain the variance in the results on the training set. Using k-fold cross validation the validation is carried out. This theme identifies Random forest approach as the best method based on the accuracy measure under different conditions. Random forest is used to train efficient and highly stable with respect to variations in classification representation parameter values and significantly more accurate than other machine learning approaches trailed

Download Full-text