Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate

This paper aims to evaluate the performance of multiple non-linear regression techniques, such as support-vector regression (SVR), k-nearest neighbor (KNN), Random Forest Regressor, Gradient Boosting, and XGBOOST for COVID-19 reproduction rate prediction and to study the impact of feature selection algorithms and hyperparameter tuning on prediction. Sixteen features (for example, Total_cases_per_million and Total_deaths_per_million) related to significant factors, such as testing, death, positivity rate, active cases, stringency index, and population density are considered for the COVID-19 reproduction rate prediction. These 16 features are ranked using Random Forest, Gradient Boosting, and XGBOOST feature selection algorithms. Seven features are selected from the 16 features according to the ranks assigned by most of the above mentioned feature-selection algorithms. Predictions by historical statistical models are based solely on the predicted feature and the assumption that future instances resemble past occurrences. However, techniques, such as Random Forest, XGBOOST, Gradient Boosting, KNN, and SVR considered the influence of other significant features for predicting the result. The performance of reproduction rate prediction is measured by mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R-Squared, relative absolute error (RAE), and root relative squared error (RRSE) metrics. The performances of algorithms with and without feature selection are similar, but a remarkable difference is seen with hyperparameter tuning. The results suggest that the reproduction rate is highly dependent on many features, and the prediction should not be based solely upon past values. In the case without hyperparameter tuning, the minimum value of RAE is 0.117315935 with feature selection and 0.0968989 without feature selection, respectively. The KNN attains a low MAE value of 0.0008 and performs well without feature selection and with hyperparameter tuning. The results show that predictions performed using all features and hyperparameter tuning is more accurate than predictions performed using selected features.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Visualization & Prediction of COVID-19 Future Outbreak by Using Machine Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.03.02 ◽

2021 ◽

Vol 13 (3) ◽

pp. 16-32

Author(s):

Ahmed Hassan Mohammed Hassan ◽

◽

Arfan Ali Mohammed Qasem ◽

Walaa Faisal Mohammed Abdalla ◽

Omer H. Elhassan

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Future Perspective ◽

Support Vector ◽

Squared Error ◽

Vector Machines ◽

The World ◽

Negative Factors

Day by day, the accumulative incidence of COVID-19 is rapidly increasing. After the spread of the Corona epidemic and the death of more than a million people around the world countries, scientists and researchers have tended to conduct research and take advantage of modern technologies to learn machine to help the world to get rid of the Coronavirus (COVID-19) epidemic. To track and predict the disease Machine Learning (ML) can be deployed very effectively. ML techniques have been anticipated in areas that need to identify dangerous negative factors and define their priorities. The significance of a proposed system is to find the predict the number of people infected with COVID19 using ML. Four standard models anticipate COVID-19 prediction, which are Neural Network (NN), Support Vector Machines (SVM), Bayesian Network (BN) and Polynomial Regression (PR). The data utilized to test these models content of number of deaths, newly infected cases, and recoveries in the next 20 days. Five measures parameters were used to evaluate the performance of each model, namely root mean squared error (RMSE), mean squared error (MAE), mean absolute error (MSE), Explained Variance score and r2 score (R2). The significance and value of proposed system auspicious mechanism to anticipate these models for the current cenario of the COVID-19 epidemic. The results showed NN outperformed the other models, while in the available dataset the SVM performs poorly in all the prediction. Reference to our results showed that injuries will increase slightly in the coming days. Also, we find that the results give rise to hope due to the low death rate. For future perspective, case explanation and data amalgamation must be kept up persistently.

Download Full-text

Support Vector Machine to Predict Electricity Consumption in the Energy Management Laboratory

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.2947 ◽

2021 ◽

Vol 5 (3) ◽

pp. 466-473

Author(s):

Azam Zamhuri Fuadi ◽

Irsyad Nashirul Haq ◽

Edi Leksono

Keyword(s):

Support Vector Machine ◽

Energy Management ◽

Mean Squared Error ◽

Measurement Data ◽

Electricity Consumption ◽

Absolute Error ◽

Support Vector ◽

Mean Error ◽

Squared Error ◽

Electrical Loads

Predicted electricity consumption is needed to perform energy management. Electricity consumption prediction is also very important in the development of intelligent power grids and advanced electrification network information. we implement a Support Vector Machine (SVM) to predict electrical loads and results compared to measurable electrical loads. Laboratory electrical loads have their own characteristics when compared to residential, commercial, or industrial, we use electrical load data in energy management laboratories to be used to be predicted. C and Gamma as searchable parameters use GridSearchCV to get optimal SVM input parameters. Our prediction data is compared to measurement data and is searched for accuracy based on RMSE (Root Square Mean Error), MAE (Mean Absolute Error) and MSE (Mean Squared Error) values. Based on this we get the optimal parameter values C 1e6 and Gamma 2.97e-07, with the result RSME (Root Square Mean Error) ; 0.37, MAE (meaning absolute error); 0.21 and MSE (Mean Squared Error); 0.14.

Download Full-text

Forecasting New Student Candidates Using the Random Forest Method

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2020.v11.i01.p05 ◽

2020 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Rahmat Robi Waliyansyah ◽

Nugroho Dwi Saputro

Keyword(s):

Random Forest ◽

Mean Squared Error ◽

College Education ◽

Absolute Error ◽

Coefficient Of Determination ◽

Squared Error ◽

Random Forest Method ◽

Study Programs ◽

New Students ◽

New Student

College education institutions regularly hold new student admissions activities, and the number of new students can increase and can also decrease. University of PGRI Semarang (UPGRIS) on the development of new student admissions for the 2014/2015 academic year up to 2018/2019 with so many admissions selection stages. To meet the minimum comparison requirements between the number of students with the development of human resources, facilities, and infrastructure, it is necessary to predict how much the number of students increases each year. To make a prediction system or forecasting, the number of prospective new students required a good forecasting method and sufficiently precise calculations to predict the number of prospective students who register. In this study, the method to be taken is the Random Forest method. For the evaluation of forecasting models used Random Sampling and Cross-validation. The parameter used is Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R2). The results of this study obtained the five highest and lowest study programs in the admission of new students. Therefore, UPGRIS will make a new strategy for the five lowest study programs so that the desired number of new students is achieved

Download Full-text

Modelling and Forecasting Portfolio Inflows

10.4018/978-1-6684-2408-7.ch069 ◽

2022 ◽

pp. 1427-1448

Author(s):

Mogari I. Rapoo ◽

Elias Munapo ◽

Martin M. Chanza ◽

Olusegun Sunday Ewemooje

Keyword(s):

Banking Sector ◽

Mean Squared Error ◽

Model Performance ◽

Absolute Error ◽

Support Vector ◽

Vector Autoregressive ◽

Squared Error ◽

Daily Data ◽

Determining Factors ◽

Push Factor

This chapter analyses efficiency of support vector regression (SVR), artificial neural networks (ANNs), and structural vector autoregressive (SVAR) models in terms of in-sample forecasting of portfolio inflows (PIs). Time series daily data sourced from Rand Merchant Bank (RMB) covering the period of 1st March 2004 to 1st February 2016 were used. Mean squared error, root mean squared error, mean absolute error, mean absolute squared error, and root mean scaled log error were used to evaluate model performance. The results showed that SVR has the best modelling performance when compared to others. In determining factors that affect allocation of PIs into South Africa based on SVAR, 69% of the variation was explained by pull factors while 9% was explained by push factor. Hence, SVR model is more accurate than ANNs. This chapter therefore recommends that banking sector particularly RMB should use machine learning technique in modelling PIs for a better financial solution.

Download Full-text

Prediction of Apnea-Hypopnea Index Using Sound Data Collected by a Noncontact Device

Otolaryngology ◽

10.1177/0194599819900014 ◽

2020 ◽

Vol 162 (3) ◽

pp. 392-399

Author(s):

Jeong-Whun Kim ◽

Taehoon Kim ◽

Jaeyoung Shin ◽

Kyogu Lee ◽

Sunkyu Choi ◽

...

Keyword(s):

Random Forest ◽

Mean Squared Error ◽

Absolute Error ◽

Sleep Stages ◽

Support Vector ◽

Apnea Hypopnea Index ◽

Tertiary Referral Hospital ◽

Obstructive Sleep ◽

Using Data ◽

Sound Features

Objective To predict the apnea-hypopnea index (AHI) in patients with obstructive sleep apnea (OSA) using data from breathing sounds recorded using a noncontact device during sleep. Study Design Prospective cohort study. Setting Tertiary referral hospital. Subject and Methods Audio recordings during sleep were performed using an air-conduction microphone during polysomnography. Breathing sounds recorded from all sleep stages were analyzed. After noise reduction preprocessing, the audio data were segmented into 5-second windows and sound features were extracted. Estimation of AHI by regression analysis was performed using a Gaussian process, support vector machine, random forest, and simple linear regression, along with 10-fold cross-validation. Results In total, 116 patients who underwent attended, in-laboratory, full-night polysomnography were included. Overall, random forest resulted in the highest performance with the highest correlation coefficient (0.83) and least mean absolute error (9.64 events/h) and root mean squared error (13.72 events/h). Other models resulted in somewhat lower but similar performances, with correlation coefficients ranging from 0.74 to 0.79. The estimated AHI tended to be underestimated as the severity of OSA increased. Regarding bias and precision, estimation performances in the severe OSA subgroup were the lowest, regardless of the model used. Among sound features, derivative of the area methods of moments of overall standard deviation demonstrated the highest correlation with AHI. Conclusion AHI was fairly predictable by using data from breathing sounds generated during sleep. The prediction model may be useful not only for prescreening but also for follow-up after treatment in patients with OSA.

Download Full-text

Predicting and Analysing the Behaviour of COVID-19

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217213 ◽

2021 ◽

pp. 40-46

Author(s):

Gaurav Singh ◽

Shivam Rai ◽

Himanshu Mishra ◽

Manoj Kumar

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Systems Science ◽

Data Repository ◽

Support Vector ◽

Squared Error

The prime objective of this work is to predicting and analysing the Covid-19 pandemic around the world using Machine Learning algorithms like Polynomial Regression, Support Vector Machine and Ridge Regression. And furthermore, assess and compare the performance of the varied regression algorithms as far as parameters like R squared, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. In this work, we have used the dataset available on Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. We have analyzed the covid19 cases from 22/1/2020 till now. We applied a supervised machine learning prediction model to forecast the possible confirmed cases for the next ten days.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text

Forecasting of Air Pollution Index PM2.5 Using Support Vector Machine(SVM)

Journal of Computing Research and Innovation ◽

10.24191/jcrinn.v5i3.149 ◽

2020 ◽

Vol 5 (3) ◽

pp. 43-53

Author(s):

Nor Hayati Binti Shafii ◽

Rohana Alias ◽

Nur Fithrinnissaa Zamani ◽

Nur Fatihah Fauzi

Keyword(s):

Air Pollution ◽

Support Vector Machine ◽

Air Quality ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Pollution Index ◽

Absolute Error ◽

Support Vector ◽

Air Pollution Index ◽

Squared Error

Air pollution is a current monitored problem in areas with high population density such as big cities. Many regions in Malaysia are facing extreme air quality issues. This situation is caused by several factors such as human behavior, environmental awareness and technological development. Accessing the air pollution index (API) accurately is very important to control its impact on environmental and human health. The work presented here aims to access air pollution index of PM2.5 using Support Vector Machine (SVM) and to compare the accuracy of four different types of the kernel function in Support Vector Machine (SVM). The data used is provided by the Department of Environment (DOE) and it is recorded from two Continuous Air Quality Monitoring Stations (CAQM) located at Tanah Merah and Kota Bharu. The results are analyzed using mean absolute error (MAE) and root mean squared error (RMSE). It is found that the proposed model using Radial Basis Function (RBF) with its parameters of cost and gamma equal to 100 can effectively and accurately forecast the air pollution index with Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) of 0.03868583 and 0.06251793 respectively for API in Kota Bharu and 0.03857308 (MAE) and 0.05895648 (RMSE) for API in Tanah Merah.

Download Full-text

Soft Computing Techniques for Weather Change Predictions in Delhi

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7382.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 793-800

Keyword(s):

Soft Computing ◽

Mean Squared Error ◽

Weather Forecasting ◽

Weather Prediction ◽

Research Work ◽

Absolute Error ◽

Support Vector ◽

Squared Error ◽

Proposed Model ◽

Soft Computing Techniques

Weather forecasting and warning is the application of science and technology to predict the state of the weather for a future time of a given location. The emergence of adverse effects of weather has endangered the life of general public in previous years. The unpredicted flood and super cyclone in many places have created havoc. The government and private agencies are working on its behaviours but still it is challenging and incomplete. But, the application of soft computing techniques in weather prediction has made a significant perfomance now a days. This research work presents the comparative study of soft computing techniques like MultiLayer Perceptron(MLP), Support Vector Machine(SVM) and J48 Decision Tree for forecasting the weather of Delhi with ten years data comprising of temperature, dew, humidity, air pressure, wind speed and visibility. This paper tries to describe the comparison among above models using four different error values like Relative Absolute Error(RAE), Mean Absolute Error(MAE), Root Mean Squared Error(RMSE) and Root Relative Squared Error(R2 ) with a proposed model by defining new algorithm. Further the performance can be enhanced if textmining will be applied in this proposed model.

Download Full-text