Reducing U2R and R2L category false negative rates with support vector machines

The KDD Cup '99 is commonly used dataset for training and testing IDS machine learning algorithms. Some of the major downsides of the dataset are the distribution and the proportions of U2R and R2L instances, which represent the most dangerous attack types, as well as the existence of R2L attack instances identical to normal traffic. This enforces minor category detection complexity and causes problems while building a machine learning model capable of detecting these attacks with sufficiently low false negative rate. This paper presents a new support vector machine based intrusion detection system that classifies unknown data instances according both to the feature values and weight factors that represent importance of features towards the classification. Increased detection rate and significantly decreased false negative rate for U2R and R2L categories, that have a very few instances in the training set, have been empirically proven.

Download Full-text

Machine-Learning-Enabled Intrusion Detection System for Cellular Connected UAV Networks

Electronics ◽

10.3390/electronics10131549 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1549

Author(s):

Rakesh Shrestha ◽

Atefeh Omidkar ◽

Sajjad Ahmadi Roudi ◽

Robert Abbas ◽

Shiho Kim

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Wildlife Conservation ◽

Detection System ◽

False Negative ◽

False Negative Rate ◽

Telecommunication Networks ◽

Security Model ◽

Accuracy Rate ◽

Negative Rate

The recent development and adoption of unmanned aerial vehicles (UAVs) is due to its wide variety of applications in public and private sector from parcel delivery to wildlife conservation. The integration of UAVs, 5G, and satellite technologies has prompted telecommunication networks to evolve to provide higher-quality and more stable service to remote areas. However, security concerns with UAVs are growing as UAV nodes are becoming attractive targets for cyberattacks due to enormously growing volumes and poor and weak inbuilt security. In this paper, we propose a UAV- and satellite-based 5G-network security model that can harness machine learning to effectively detect of vulnerabilities and cyberattacks. The solution is divided into two main parts: the model creation for intrusion detection using various machine learning (ML) algorithms and the implementation of ML-based model into terrestrial or satellite gateways. The system identifies various attack types using realistic CSE-CIC IDS-2018 network datasets published by Canadian Establishment for Cybersecurity (CIC). It consists of seven different types of new and contemporary attack types. This paper demonstrates that ML algorithms can be used to classify benign or malicious packets in UAV networks to enhance security. Finally, the tested ML algorithms are compared for effectiveness in terms of accuracy rate, precision, recall, F1-score, and false-negative rate. The decision tree algorithm performed well by obtaining a maximum accuracy rate of 99.99% and a minimum false negative rate of 0% in detecting various attacks as compared to all other types of ML classifiers.

Download Full-text

Machine Learning Model for Credit Card Fraud Detection- A Comparative Analysis

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/6/6 ◽

2021 ◽

Author(s):

Pratyush Sharma ◽

Souradeep Banerjee ◽

Devyanshi Tiwari ◽

Jagdish Chandra Patni

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Credit Card ◽

Detection System ◽

Fraud Detection ◽

Learning Model ◽

Machine Learning Algorithms ◽

Support Vector ◽

Machine Learning Model ◽

Artificial Neural Network Ann

In today's world, we are on an express train to a cashless society which has led to a tremendous escalation in the use of credit card transactions. But the flipside of this is that fraudulent activities are on the increase; therefore, implementation of a methodical fraud detection system is indispensable to cardholders as well as the card-issuing banks. In this paper, we are going to use different machine learning algorithms like random forest, logistic regression, Support Vector Machine (SVM), and Neural Networks to train a machine learning model based on the given dataset and create a comparative study on the accuracy and different measures of the models being achieved using each of these algorithms. Using the comparative analysis on the F_1 score, we will be able to predict which algorithm is best suited to serve our purpose for the same. Our study concluded that Artificial Neural Network (ANN) performed best with an F_1 score of 0.91.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Comparative Performance of Machine Learning Algorithms for Cryptocurrency Forecasting

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i3.pp1121-1128 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1121 ◽

Cited By ~ 4

Author(s):

Nor Azizah Hitam ◽

Amelia Ritahani Ismail

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Series Data ◽

Support Vector ◽

Small Range ◽

Accuracy Rate ◽

Comparative Performance ◽

Vector Machines

Machine Learning is part of Artificial Intelligence that has the ability to make future forecastings based on the previous experience. Methods has been proposed to construct models including machine learning algorithms such as Neural Networks (NN), Support Vector Machines (SVM) and Deep Learning. This paper presents a comparative performance of Machine Learning algorithms for cryptocurrency forecasting. Specifically, this paper concentrates on forecasting of time series data. SVM has several advantages over the other models in forecasting, and previous research revealed that SVM provides a result that is almost or close to actual result yet also improve the accuracy of the result itself. However, recent research has showed that due to small range of samples and data manipulation by inadequate evidence and professional analyzers, overall status and accuracy rate of the forecasting needs to be improved in further studies. Thus, advanced research on the accuracy rate of the forecasted price has to be done.

Download Full-text

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents

Feature Dimension Reduction for Content-Based Image Identification - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-5775-3.ch007 ◽

2018 ◽

pp. 122-139

Author(s):

Saugata Bose ◽

Ritambhra Korpal

Keyword(s):

Machine Learning ◽

Language Processing ◽

Confusion Matrix ◽

False Negative ◽

False Negative Rate ◽

Search Space ◽

Machine Learning Algorithms ◽

C4.5 Decision Tree ◽

N Gram ◽

Four Levels

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Download Full-text

Combination with Machine Learning Algorithms for the Classification in E-Bussiness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.230-232.625 ◽

2011 ◽

Vol 230-232 ◽

pp. 625-628

Author(s):

Lei Shi ◽

Xin Ming Ma ◽

Xiao Hong Hu

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Mathematical Tool ◽

Vector Machines

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.

Download Full-text

Linear Support Vector Machines for Prediction of Student Performance in School-Based Education

Mathematical Problems in Engineering ◽

10.1155/2020/4761468 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Nalindren Naicker ◽

Timothy Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Student Performance ◽

State Of The Art ◽

Learning Algorithms ◽

The State ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Vector Machines

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.

Download Full-text

LOCAL DESCRIPTOR MATCHING WITH SUPPORT VECTOR MACHINES

International Journal of Information Acquisition ◽

10.1142/s0219878910002051 ◽

2010 ◽

Vol 07 (01) ◽

pp. 59-80

Author(s):

D. CHENG ◽

S. Q. XIE ◽

E. HÄMMERLE

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Local Descriptor ◽

Local Descriptors ◽

Vector Machines ◽

Three Stages ◽

Image Transformations

Local descriptor matching is the most overlooked stage of the three stages of the local descriptor process, and this paper proposes a new method for matching local descriptors based on support vector machines. Results from experiments show that the developed method is more robust for matching local descriptors for all image transformations considered. The method is able to be integrated with different local descriptor methods, and with different machine learning algorithms and this shows that the approach is sufficiently robust and versatile.

Download Full-text

An prediction of Healthy Diet required to Ease the recovery from Covid-19 using the approach of Machine Learning.

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012019 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012019

Author(s):

Rencita Maria Colaco ◽

Shreya ◽

N V Subba Reddy ◽

U Dinesh Acharya

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Machine Learning Algorithms ◽

Healthy Diet ◽

Support Vector ◽

Decision Tree Algorithm ◽

Huge Number ◽

Vector Machines ◽

The World ◽

Main Factor

Abstract Global terror that has shaken the world named, COVID-19 virus has taken away huge number of lives. According to the research there are lot of recovery cases also. Most important thing to survive from this disease is having good immunity. Everyone does not have same level of immunity. One main factor on which immunity depends is having a healthy diet. If the routine of having healthy diet is maintained, then the immunity to fight against this virus increases. It is much required that people need to be informed about having an healthy diet. Using the dataset of healthy dietary and using various machine learning algorithms we can determine what type of diet one person needs to have. By using algorithms like Random Forest, KNN, logistic regression and Support Vector Machines we can determine the type of diet and probability of recovery. The dataset required for analysis needs to have all the information regarding the diet. Based on the dataset the prediction is taken place by using Decision Tree algorithm. This method of finding the appropriate diet of a particular person based on amount of Sugar level, Blood Pressure and BMI can be the most useful research in this pandemic time.

Download Full-text