Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets

Rahul Deo Verma; Shefalika Ghosh Samaddar; A. B. Samaddar

doi:10.5121/ijcnc.2021.13603

Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2021.13603 ◽

2021 ◽

Vol 13 (6) ◽

pp. 41-52

Author(s):

Rahul Deo Verma ◽

Shefalika Ghosh Samaddar ◽

A. B. Samaddar

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Dynamic Network ◽

Substantial Improvement ◽

Dynamic Nature ◽

Feature Selection Technique ◽

Data Set ◽

Minority Class ◽

Selection Technique ◽

Experimental Findings

The Border Gateway Protocol (BGP) provides crucial routing information for the Internet infrastructure. A problem with abnormal routing behavior affects the stability and connectivity of the global Internet. The biggest hurdles in detecting BGP attacks are extremely unbalanced data set category distribution and the dynamic nature of the network. This unbalanced class distribution and dynamic nature of the network results in the classifier's inferior performance. In this paper we proposed an efficient approach to properly managing these problems, the proposed approach tackles the unbalanced classification of datasets by turning the problem of binary classification into a problem of multiclass classification. This is achieved by splitting the majority-class samples evenly into multiple segments using Affinity Propagation, where the number of segments is chosen so that the number of samples in any segment closely matches the minority-class samples. Such sections of the dataset together with the minor class are then viewed as different classes and used to train the Extreme Learning Machine (ELM). The RIPE and BCNET datasets are used to evaluate the performance of the proposed technique. When no feature selection is used, the proposed technique improves the F1 score by 1.9% compared to state-of-the-art techniques. With the Fischer feature selection algorithm, the proposed algorithm achieved the highest F1 score of 76.3%, which was a 1.7% improvement over the compared ones. Additionally, the MIQ feature selection technique improves the accuracy by 3.5%. For the BCNET dataset, the proposed technique improves the F1 score by 1.8% for the Fisher feature selection technique. The experimental findings support the substantial improvement in performance from previous approaches by the new technique.

Download Full-text

A comparative study on dimensionality reduction between principal component analysis and k-means clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i2.pp752-758 ◽

2019 ◽

Vol 16 (2) ◽

pp. 752

Author(s):

Norsyela Muhammad Noor Mathivanan ◽

Nor Azura Md.Ghani ◽

Roziah Mohd Janor

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Time Complexity ◽

Principal Component ◽

Component Analysis ◽

Classification Model ◽

Small Data ◽

Feature Selection Technique ◽

Data Set ◽

Selection Technique

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>

Download Full-text

Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the Successive Projections Algorithm feature-selection technique

European Journal of Pharmaceutical Sciences ◽

10.1016/j.ejps.2013.09.019 ◽

2014 ◽

Vol 51 ◽

pp. 189-195 ◽

Cited By ~ 17

Author(s):

Mohammad Goodarzi ◽

Wouter Saeys ◽

Mario Cesar Ugulino de Araujo ◽

Roberto Kawakami Harrop Galvão ◽

Yvan Vander Heyden

Keyword(s):

Feature Selection ◽

Molecular Descriptors ◽

Binary Classification ◽

Antileishmanial Activity ◽

Successive Projections Algorithm ◽

Feature Selection Technique ◽

Selection Technique ◽

Chalcone Derivatives ◽

Successive Projections

Download Full-text

Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms

Electronics ◽

10.3390/electronics10172099 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2099

Author(s):

Paweł Ziemba ◽

Jarosław Becker ◽

Aneta Becker ◽

Aleksandra Radomska-Zalas ◽

Mateusz Pawluk ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Decision Support ◽

Binary Classification ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Feature Selection Technique ◽

Selection Technique ◽

Feature Discretization ◽

The Impact

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.

Download Full-text

Genetic Algorithm Based Feature Selection Technique for Electroencephalography Data

Applied Computer Systems ◽

10.2478/acss-2019-0015 ◽

2019 ◽

Vol 24 (2) ◽

pp. 119-127

Author(s):

Tariq Ali ◽

Asif Nawaz ◽

Hafiza Ayesha Sadia

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Feature Subset ◽

Huge Number ◽

Feature Selection Technique ◽

Data Set ◽

Selection Technique ◽

State Classification ◽

Knn Classifier ◽

Selection Of

Abstract High dimensionality is a well-known problem that has a huge number of highlights in the data, yet none is helpful for a particular data mining task undertaking, for example, classification and grouping. Therefore, selection of features is used frequently to reduce the data set dimensionality. Feature selection is a multi-target errand, which diminishes dataset dimensionality, decreases the running time, and furthermore enhances the expected precision. In the study, our goal is to diminish the quantity of features of electroencephalography data for eye state classification and achieve the same or even better classification accuracy with the least number of features. We propose a genetic algorithm-based feature selection technique with the KNN classifier. The accuracy is improved with the selected feature subset using the proposed technique as compared to the full feature set. Results prove that the classification precision of the proposed strategy is enhanced by 3 % on average when contrasted with the accuracy without feature selection.

Download Full-text

Student Performance Prediction Using A Cascaded Bi-level Feature Selection Approach

Journal of Computer Science Research ◽

10.30564/jcsr.v3i3.3534 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Wokili Abdullahi ◽

Mary Ogbuka Kenneth ◽

Morufu Olalere

Keyword(s):

Feature Selection ◽

Student Performance ◽

Performance Prediction ◽

Binary Classification ◽

Features Selection ◽

Feature Selection Technique ◽

Selection Technique ◽

Single Level ◽

Optimal Subset ◽

Feature Selection Approach

Features in educational data are ambiguous which leads to noisy features and curse of dimensionality problems. These problems are solved via feature selection. There are existing models for features selection. These models were created using either a single-level embedded, wrapperbased or filter-based methods. However single-level filter-based methods ignore feature dependencies and ignore the interaction with the classifier. The embedded and wrapper based feature selection methods interact with the classifier, but they can only select the optimal subset for a particular classifier. So their selected features may be worse for other classifiers. Hence this research proposes a robust Cascade Bi-Level (CBL) feature selection technique for student performance prediction that will minimize the limitations of using a single-level technique. The proposed CBL feature selection technique consists of the Relief technique at first-level and the Particle Swarm Optimization (PSO) at the second-level. The proposed technique was evaluated using the UCI student performance dataset. In comparison with the performance of the single-level feature selection technique the proposed technique achieved an accuracy of 94.94% which was better than the values achieved by the single-level PSO with an accuracy of 93.67% for the binary classification task. These results show that CBL can effectively predict student performance.

Download Full-text

An Integrated Approach of Proposed Pruning Based Feature Selection Technique (PBFST) for Phishing E-mail Detection

Recent Advances in Computer Science and Communications ◽

10.2174/2666255814666210322162129 ◽

2021 ◽

Vol 14 ◽

Author(s):

Hari Shanker Hota ◽

Dinesh Sharma ◽

Akhilesh Shrivas

Keyword(s):

Data Mining ◽

Feature Selection ◽

Secure Communication ◽

Optimization Technique ◽

Integrated Approach ◽

Feature Selection Technique ◽

Data Set ◽

Selection Technique ◽

Feature Based ◽

Using Data

Introduction: Entire world is shifting towards electronic communication through Email for fast and secure communication. Millions of people, including organization, government, and others, are using Email services. This growing number of Email users are facing problems; therefore, detecting phishing Email is a challenging task, especially for non-IT users. Automatic detection of phishing Email is essential to deploy along with Email software. Various authors have worked in the field of phishing Email classification with different feature selection and optimization technique for better performance. Objective: This paper attempts to build a model for the detection of phishing Email using data mining techniques. This paper's significant contribution is to develop and apply Feature Selection Technique (FST) to reduce features from the phishing Email benchmark data set. Methods: The proposed Pruning Based Feature Selection Technique (PBFST) is used to determine the rank of feature based on the level of the tree where feature exists. The proposed algorithm is integrated with already developed Bucket Based Feature Selection Technique (BBFST). BBFST is used as an internal part to rank features in a particular level of the tree. Results : Experimental work was done with open source WEKA data mining software using a 10-fold cross-validation technique. The proposed FST was compared with other ranking based FSTs to check the performance of C4.5 classifier with Phishing Email data set. Conclusion: The proposed FST reduces 33 features out of 47 features which exist in phishing Email data set and C4.5 algorithm produces remarkable accuracy of 99.06% with only 11 features and found to be better than other existing FST.

Download Full-text

SyntcRec: a Syntactic Recommender System Based on Improved Feature Selection Technique in Large Scholarly Data

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13353 ◽

2017 ◽

Vol 7 (6) ◽

pp. 537

Author(s):

Deepa Mandave ◽

Govind Pole

Keyword(s):

Feature Selection ◽

Recommender System ◽

Feature Selection Technique ◽

Selection Technique ◽

Scholarly Data

Download Full-text

Identification of Secretory Proteins of Malaria Parasite by Feature Selection Technique

Letters in Organic Chemistry ◽

10.2174/1570178614666170329155502 ◽

2017 ◽

Vol 14 (9) ◽

Cited By ~ 14

Author(s):

Hua Tang ◽

Chunmei Zhang ◽

Rong Chen ◽

Po Huang ◽

Chenggang Duan ◽

...

Keyword(s):

Feature Selection ◽

Malaria Parasite ◽

Secretory Proteins ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Hybrid feature selection technique for prediction of cardiovascular diseases

Materials Today Proceedings ◽

10.1016/j.matpr.2021.03.225 ◽

2021 ◽

Author(s):

Pavithra V ◽

Jayalakshmi V

Keyword(s):

Feature Selection ◽

Cardiovascular Diseases ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Predicting breast cancer recurrence using effective classification and feature selection technique

2016 19th International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2016.7860215 ◽

2016 ◽

Cited By ~ 10

Author(s):

Ahmed Iqbal Pritom ◽

Md. Ahadur Rahman Munshi ◽

Shahed Anzarus Sabab ◽

Shihabuzzaman Shihab

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text