Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

Optimized machine learning algorithm for intrusion detection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i1.pp590-599 ◽

2021 ◽

Vol 24 (1) ◽

pp. 590

Author(s):

Royida A. Ibrahem Alhayali ◽

Mohammad Aljanabi ◽

Ahmed Hussein Ali ◽

Mostafa Abdulghfoor Mohammed ◽

Tole Sutikno

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithm ◽

Detection System ◽

Optimization Algorithms ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Support Vector ◽

Feature Subset ◽

Training Time

Intrusion detection is mainly achieved by using optimization algorithms. The need for optimization algorithms for intrusion detection is necessitated by the increasing number of features in audit data, as well as the performance failure of the human-based smart intrusion detection system (IDS) in terms of their prolonged training time and classification accuracy. This article presents an improved intrusion detection technique for binary classification. The proposal is a combination of different optimizers, including Rao optimization algorithm, extreme learning machine (ELM), support vector machine (SVM), and logistic regression (LR) (for feature selection & weighting), as well as a hybrid Rao-SVM algorithm with supervised machine learning (ML) techniques for feature subset selection (FSS). The process of selecting the least number of features without sacrificing the FSS accuracy was considered a multi-objective optimization problem. The algorithm-specific, parameter-less concept of the proposed Rao-SVM was also explored in this study. The KDDCup 99 and CICIDS 2017 were used as the intrusion dataset for the experiments, where significant improvements were noted with the new Rao-SVM compared to the other algorithms. Rao-SVM presented better results than many existing works by reaching 100% accuracy for KDDCup 99 dataset and 97% for CICIDS dataset.

Download Full-text

Improved TLBO-JAYA Algorithm for Subset Feature Selection and Parameter Optimisation in Intrusion Detection System

Complexity ◽

10.1155/2020/5287684 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Mohammad Aljanabi ◽

Mohd Arfian Ismail ◽

Vitaly Mezhuyev

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Parameter Tuning ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Support Vector ◽

Feature Subset

Many optimisation-based intrusion detection algorithms have been developed and are widely used for intrusion identification. This condition is attributed to the increasing number of audit data features and the decreasing performance of human-based smart intrusion detection systems regarding classification accuracy, false alarm rate, and classification time. Feature selection and classifier parameter tuning are important factors that affect the performance of any intrusion detection system. In this paper, an improved intrusion detection algorithm for multiclass classification was presented and discussed in detail. The proposed method combined the improved teaching-learning-based optimisation (ITLBO) algorithm, improved parallel JAYA (IPJAYA) algorithm, and support vector machine. ITLBO with supervised machine learning (ML) technique was used for feature subset selection (FSS). The selection of the least number of features without causing an effect on the result accuracy in FSS is a multiobjective optimisation problem. This work proposes ITLBO as an FSS mechanism, and its algorithm-specific, parameterless concept (no parameter tuning is required during optimisation) was explored. IPJAYA in this study was used to update the C and gamma parameters of the support vector machine (SVM). Several experiments were performed on the prominent intrusion ML dataset, where significant enhancements were observed with the suggested ITLBO-IPJAYA-SVM algorithm compared with the classical TLBO and JAYA algorithms.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

IntruDTree: A Machine Learning-Based Cyber Security Intrusion Detection Model

10.20944/preprints202004.0481.v1 ◽

2020 ◽

Author(s):

Iqbal H. Sarker ◽

Yoosef B. Abushark ◽

Fawaz Alsolami ◽

Asif Irshad Khan

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Cyber Security ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Support Vector ◽

Security Model ◽

K Nearest Neighbor ◽

Detection Model

Cyber security has recently received enormous attention in today’s security concerns, due to the popularity of the Internet-of-Things (IoT), the tremendous growth of computer networks, and the huge number of relevant applications. Thus, detecting various cyber-attacks or anomalies in a network and building an effective intrusion detection system that performs an essential role in today’s security is becoming more important. Artificial intelligence, particularly machine learning techniques, can be used for building such a data-driven intelligent intrusion detection system. In order to achieve this goal, in this paper, we present an Intrusion Detection Tree (“IntruDTree”) machine-learning-based security model that first takes into account the ranking of security features according to their importance and then build a tree-based generalized intrusion detection model based on the selected important features. This model is not only effective in terms of prediction accuracy for unseen test cases but also minimizes the computational complexity of the model by reducing the feature dimensions. Finally, the effectiveness of our IntruDTree model was examined by conducting experiments on cybersecurity datasets and computing the precision, recall, fscore, accuracy, and ROC values to evaluate. We also compare the outcome results of IntruDTree model with several traditional popular machine learning methods such as the naive Bayes classifier, logistic regression, support vector machines, and k-nearest neighbor, to analyze the effectiveness of the resulting security model.

Download Full-text

Machine Learning Techniques to Predict Software Defect

Encyclopedia of Business Analytics and Optimization ◽

10.4018/978-1-4666-5202-6.ch129 ◽

2014 ◽

pp. 1422-1434 ◽

Cited By ~ 1

Author(s):

Ramakanta Mohanty ◽

Vadlamani Ravi

Keyword(s):

Machine Learning ◽

Feature Subset Selection ◽

Machine Learning Techniques ◽

Group Method ◽

Software Project ◽

Feature Subset ◽

Software Defects ◽

Software Defect ◽

Learning Techniques ◽

Sensitivity Specificity

The past 10 years have seen the prediction of software defects proposed by many researchers using various metrics based on measurable aspects of source code entities (e.g. methods, classes, files or modules) and the social structure of software project in an effort to predict the software defects. However, these metrics could not predict very high accuracies in terms of sensitivity, specificity and accuracy. In this chapter, we propose the use of machine learning techniques to predict software defects. The effectiveness of all these techniques is demonstrated on ten datasets taken from literature. Based on an experiment, it is observed that PNN outperformed all other techniques in terms of accuracy and sensitivity in all the software defects datasets followed by CART and Group Method of data handling. We also performed feature selection by t-statistics based approach for selecting feature subsets across different folds for a given technique and followed by the feature subset selection. By taking the most important variables, we invoked the classifiers again and observed that PNN outperformed other classifiers in terms of sensitivity and accuracy. Moreover, the set of ‘if- then rules yielded by J48 and CART can be used as an expert system for prediction of software defects.

Download Full-text

An Intelligent Network Intrusion Detection System Based on Multi-Modal Support Vector Machines

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2013100104 ◽

2013 ◽

Vol 7 (4) ◽

pp. 37-52

Author(s):

Srinivasa K G

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Support Vector ◽

Intelligent Network ◽

Statistical Machine Learning ◽

High Detection Rate ◽

Network Intrusion

Increase in the number of network based transactions for both personal and professional use has made network security gain a significant and indispensable status. The possible attacks that an Intrusion Detection System (IDS) has to tackle can be of an existing type or of an entirely new type. The challenge for researchers is to develop an intelligent IDS which can detect new attacks as efficiently as they detect known ones. Intrusion Detection Systems are rendered intelligent by employing machine learning techniques. In this paper we present a statistical machine learning approach to the IDS using the Support Vector Machine (SVM). Unike conventional SVMs this paper describes a milti model approach which makes use of an extra layer over the existing SVM. The network traffic is modeled into connections based on protocols at various network layers. These connection statistics are given as input to SVM which in turn plots each input vector. The new attacks are identified by plotting them with respect to the trained system. The experimental results demonstrate the lower execution time of the proposed system with high detection rate and low false positive number. The 1999 DARPA IDS dataset is used as the evaluation dataset for both training and testing. The proposed system, SVM NIDS is bench marked with SNORT (Roesch, M. 1999), an open source IDS.

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text