A Study of Machine Learning Algorithms for DDoS Detection

Sheikh Shehzad Ahmed

doi:10.22214/ijraset.2021.34922

A Study of Machine Learning Algorithms for DDoS Detection

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.34922 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 174-178

Author(s):

Sheikh Shehzad Ahmed

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Random Forest Classifier ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Ddos Attacks ◽

Decision Tree Classifier ◽

Tree Classifier

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.

Download Full-text

Fake News Data Exploration and Analytics

Electronics ◽

10.3390/electronics10192326 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2326

Author(s):

Mazhar Javed Awan ◽

Awais Yasin ◽

Haitham Nobanee ◽

Ahmed Abid Ali ◽

Zain Shahzad ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Random Forest Classifier ◽

The Internet ◽

Fake News ◽

Learning Models ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Machine Learning Models

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.

Download Full-text

An Efficient Classifier for U2R, R2L, DoS Attack

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1942.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 644-647

Keyword(s):

Machine Learning ◽

Network Security ◽

Learning Algorithms ◽

Research Area ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Detection Accuracy ◽

Cyber Attack ◽

Detection Systems

The internet has become an irreplaceable communicating and informative tool in the current world. With the ever-growing importance and massive use of the internet today, there has been interesting from researchers to find the perfect Cyber Attack Detection Systems (CADSs) or rather referred to as Intrusion Detection Systems (IDSs) to protect against the vulnerabilities of network security. CADS presently exist in various variants but can be largely categorized into two broad classifications; signature-based detection and anomaly detection CADSs, based on their approaches to recognize attack packets.The signature-based CADS use the well-known signatures or fingerprints of the attack packets to signal the entry across the gateways of secured networks. Signature-based CADS can only recognize threats that use the known signature, new attacks with unknown signatures can, therefore, strike without notice. Alternatively, anomaly-based CADS are enabled to detect any abnormal traffic within the network and report. There are so many ways of identifying anomalies and different machine learning algorithms are introduced to counter such threats. Most systems, however, fall short of complete attack prevention in the real world due system administration and configuration, system complexity and abuse of authorized access. Several scholars and researchers have achieved a significant milestone in the development of CADS owing to the importance of computer and network security. This paper reviews the current trends of CADS analyzing the efficiency or level of detection accuracy of the machine learning algorithms for cyber-attack detection with an aim to point out to the best. CADS is a developing research area that continues to attract several researchers due to its critical objective.

Download Full-text

Ensemble-Based Machine Learning for Predicting Sudden Human Fall Using Health Data

Mathematical Problems in Engineering ◽

10.1155/2021/8608630 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Utkarsh Saxena ◽

Soumen Moulik ◽

Soumya Ranjan Nayak ◽

Thomas Hanne ◽

Diptendu Sinha Roy

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Majority Voting ◽

Support Vector ◽

Human Beings ◽

Medical Terminology ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Health Parameters

We attempt to predict the accidental fall of human beings due to sudden abnormal changes in their health parameters such as blood pressure, heart rate, and sugar level. In medical terminology, this problem is known as Syncope. The primary motivation is to prevent such falls by predicting abnormal changes in these health parameters that might trigger a sudden fall. We apply various machine learning algorithms such as logistic regression, a decision tree classifier, a random forest classifier, K-Nearest Neighbours (KNN), a support vector machine, and a naive Bayes classifier on a relevant dataset and verify our results with the cross-validation method. We observe that the KNN algorithm provides the best accuracy in predicting such a fall. However, the accuracy results of some other algorithms are also very close. Thus, we move one step further and propose an ensemble model, Majority Voting, which aggregates the prediction results of multiple machine learning algorithms and finally indicates the probability of a fall that corresponds to a particular human being. The proposed ensemble algorithm yields 87.42% accuracy, which is greater than the accuracy provided by the KNN algorithm.

Download Full-text

Heart Failure Detection Using Quantum-Enhanced Machine Learning and Traditional Machine Learning Techniques for Internet of Artificially Intelligent Medical Things

Wireless Communications and Mobile Computing ◽

10.1155/2021/1616725 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Yogesh Kumar ◽

Apeksha Koul ◽

Pushpendra Singh Sisodia ◽

Jana Shafi ◽

Verma Kavita ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Random Forest ◽

Learning Algorithms ◽

Failure Detection ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Research Progress ◽

Record Management

Quantum-enhanced machine learning plays a vital role in healthcare because of its robust application concerning current research scenarios, the growth of novel medical trials, patient information and record management, procurement of chronic disease detection, and many more. Due to this reason, the healthcare industry is applying quantum computing to sustain patient-oriented attention to healthcare patrons. The present work summarized the recent research progress in quantum-enhanced machine learning and its significance in heart failure detection on a dataset of 14 attributes. In this paper, the number of qubits in terms of the features of heart failure data is normalized by using min-max, PCA, and standard scalar, and further, has been optimized using the pipelining technique. The current work verifies that quantum-enhanced machine learning algorithms such as quantum random forest (QRF), quantum K nearest neighbour (QKNN), quantum decision tree (QDT), and quantum Gaussian Naïve Bayes (QGNB) are better than traditional machine learning algorithms in heart failure detection. The best accuracy rate is (0.89), which the quantum random forest classifier attained. In addition to this, the quantum random forest classifier also incurred the best results in F 1 score, recall and, precision by (0.88), (0.93), and (0.89), respectively. The computation time taken by traditional and quantum-enhanced machine learning algorithms has also been compared where the quantum random forest has the least execution time by 150 microseconds. Hence, the work provides a way to quantify the differences between standard and quantum-enhanced machine learning algorithms to select the optimal method for detecting heart failure.

Download Full-text

Detailed Analysis of Intrusion Detection using Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2127.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1894-1899 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Svm Classifier ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Internet Users ◽

Tree Classifier ◽

Challenging Tasks

The number of internet users has increased exponentially over the years and so have increased intrusive activities significantly. To detect an intrusion attack in a system connected over a network is one of the most challenging tasks in today’s world. A significant number of techniques have been developed which are based on machine learning approaches to detect these intrusion attacks. Even though these techniques are good, they are not good enough to detect all kinds of attacks. In this paper, the analysis of different machine learning algorithm will be performed on the NSL-KDD dataset with pre-processing steps like One-hot encoding, feature selection and random sampling to use in different machine learning models to find the best performing model to detect these attacks. The attacks are from the datasets are classified into four types of attacks: Probe, DoS, U2R, R2L while the non- attack is the Normal. The dataset is in two parts: KDD-Train and KDD-Test. The dataset is trained and tested to find accuracy and understand the performance of different machine learning algorithms and compare them. The Machine Learning algorithms used are Naive Bayes Classifier, Decision Tree Classifier, Random Forest Classifier, KNeighbours Classifier, Logistic Regression, SVM Classifier, Voting Classifier. These techniques are compared according to their capability to detect the attacks. This comparison will help to find the algorithm which would work the best to detect different kinds of intrusion attacks.

Download Full-text

Exploration of Neighbor Kernels and Feature Estimators for Heart Disease Prediction using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3472.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 599-605

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Random Forest Classifier ◽

Support Vector ◽

Classification Algorithms ◽

Disease Prediction ◽

Decision Tree Classifier ◽

Support Vector Classifier ◽

Tree Classifier

In the growing era of technological world, the people are suffered with various diseases. The common disease faced by the population irrespective of the age is the heart disease. Though the world is blooming in technological aspects, the prediction and the identification of the heart disease still remains a challenging issue. Due to the deficiency of the availability of patient symptoms, the prediction of heart disease is a disputed charge. With this overview, we have used Heart Disease Prediction dataset extorted from UCI Machine Learning Repository for the analysis and comparison of various parameters in the classification algorithms. The parameter analysis of various classification algorithms of heart disease classes are done in five ways. Firstly, the analysis of dataset is done by exploiting the correlation matrix, feature importance analysis, Target distribution of the dataset and Disease probability based on the density distribution of age and sex. Secondly, the dataset is fitted to K-Nearest Neighbor classifier to analyze the performance for the various combinations of neighbors with and without PCA. Thirdly, the dataset is fitted to Support Vector classifier to analyze the performance for the various combinations of kernels with and without PCA. Fourth, the dataset is fitted to Decision Tree classifier to analyze the performance for the various combinations of features with and without PCA. Fifth, the dataset is fitted to Random Forest classifier to analyze the performance for the various levels of estimators with and without PCA. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that for KNN classifier, the performance for 12 neighbours is found to be effective with 0.52 before applying PCA and 0.53 after applying PCA. For Support Vector classifier, the rbf kernel is found to be effective with the score of 0.519 with and without PCA. For Decision Tree classifier, before applying PCA, the score is 0.47 for 7 features and after applying PCA, the score is 0.49 for 4 features. For, Random Forest Classifier, before applying PCA, the score is 0.53 for 500 estimators and after applying PCA, the score is 0.52 for 500 estimators.

Download Full-text

Security Analysis of DDoS Attacks Using Machine Learning Algorithms in Networks Traffic

Electronics ◽

10.3390/electronics10232919 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2919

Author(s):

Rami J. Alzahrani ◽

Ahmed Alzahrani

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

New Technology ◽

Security Analysis ◽

Computation Time ◽

Machine Learning Algorithms ◽

The Internet ◽

Ddos Attacks ◽

Iot Devices

The recent advance in information technology has created a new era named the Internet of Things (IoT). This new technology allows objects (things) to be connected to the Internet, such as smart TVs, printers, cameras, smartphones, smartwatches, etc. This trend provides new services and applications for many users and enhances their lifestyle. The rapid growth of the IoT makes the incorporation and connection of several devices a predominant procedure. Although there are many advantages of IoT devices, there are different challenges that come as network anomalies. In this research, the current studies in the use of deep learning (DL) in DDoS intrusion detection have been presented. This research aims to implement different Machine Learning (ML) algorithms in WEKA tools to analyze the detection performance for DDoS attacks using the most recent CICDDoS2019 datasets. CICDDoS2019 was found to be the model with best results. This research has used six different types of ML algorithms which are K_Nearest_Neighbors (K-NN), super vector machine (SVM), naïve bayes (NB), decision tree (DT), random forest (RF) and logistic regression (LR). The best accuracy result in the presented evaluation was achieved when utilizing the Decision Tree (DT) and Random Forest (RF) algorithms, 99% and 99%, respectively. However, the DT is better than RF because it has a shorter computation time, 4.53 s and 84.2 s, respectively. Finally, open issues for further research in future work are presented.

Download Full-text

Heart Disease Prediction using Machine Learning Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4537.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10316-10320

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Descriptive Analysis ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Decision Tree Classifier ◽

Random Forest Regression ◽

Learning Techniques ◽

Tree Classifier

Nowadays, heart disease has become a major disease among the people irrespective of the age. We are seeing this even in children dying due to the heart disease. If we can predict this even before they die, there may be huge chances of surviving. Everybody has various qualities of beat rate (pulse rate) and circulatory strain (blood pressure). We are living in a period of data. Due to the rise in the technology, the amount of data that is generated is increasing daily. Some terabytes of data are being produced and stored. For example, the huge amount of data about the patients is produced in the hospitals such as chest pain, heart rate, blood pressure, pulse rate etc. If we can get this data and apply some machine learning techniques, we can reduce the probability of people dying. In this paper we have done survey using different classification and grouping strategies, for example, KNN, Decision tree classifier, Gaussian Naïve Bayes, Support vector machine, Linear regression, Logistic regression, Random forest classifier, Random forest regression, linear descriptive analysis. We have taken the 14 attributes that are present in the dataset as an input and applying on the dataset which is taken from the UCI repository to develop and accurate model of predicting the heart disease contains colossal (huge) therapeutic (medical) information. In the proposed research, the exhibition of the conclusion model is acquired by using utilizing classification strategies. In this paper proposed an accuracy model to predict whether a person has coronary disease or not. This is implemented by comparing the accuracies of different machine-learning strategies such as KNN, Decision tree classifier, Gaussian Naïve Bayes, SVM, Logistic regression, Random forest classifier, Linear regression, Random forest regression, linear descriptive analysis

Download Full-text

Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.03 ◽

2021 ◽

Vol 13 (5) ◽

pp. 30-40

Author(s):

Pankaj Bhowmik ◽

◽

Pulak Chandra Bhowmik ◽

U. A. Md. Ehsan Ali ◽

Md. Sohrawordi

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Health Risks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Model Assessment ◽

Decision Tree Classifier ◽

Chi Square ◽

Fetal Health ◽

Tree Classifier

A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.

Download Full-text

Comparison of machine learning algorithms for DDoS attack detection in SDN

Information and Control Systems ◽

10.31799/1684-8853-2020-3-59-70 ◽

2020 ◽

pp. 59-70

Author(s):

Duc Le ◽

Minh Dao ◽

Quyen Nguyen

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Algorithms ◽

Attack Detection ◽

Machine Learning Algorithms ◽

Software Defined Networks ◽

Ddos Attacks ◽

Ddos Attack ◽

Flow Table ◽

Ddos Attack Detection

Introduction: Distributed denial-of-service (DDoS) has become a common attack type in cyber security. Apart from the conventional DDoS attacks, software-defined networks also face some other typical DDoS attacks, such as flow-table attack or controller attack. One of the most recent solutions to detect a DDoS attack is using machine learning algorithms to classify the traffic. Purpose: Analysis of applying machine learning algorithms in order to prevent DDoS attacks in software-defined network. Results: A comparison of six algorithms (random forest, decision tree, naive Bayes, support vector machine, multilayer perceptron, k-nearest neighbors) with accuracy and process time as the criteria has shown that a decision tree and naïve Bayes are the most suitable algorithms for DDoS attack detection. As compared to other algorithms, they have higher accuracy, faster processing time and lower resource consumption. The main features that identify malicious traffic compared to normal one are the number of bytes in a flow, time flow, Ethernet source address, and Ethernet destination address. A flow-table attack can be detected easier than a bandwidth attack, as all the six algorithms can predict this type with a high accuracy. Practical relevance: Important features which play a supporting role in correct data classification facilitate the development of a DDoS protection system with a smaller dataset, focusing only on the necessary data. The algorithms more suitable for machine learning can help us to detect DDoS attacks in software-defined networks more accurately.

Download Full-text