scholarly journals Securing BGP by Handling Dynamic Network Behavior and Unbalanced Datasets

2021 ◽  
Vol 13 (6) ◽  
pp. 41-52
Author(s):  
Rahul Deo Verma ◽  
Shefalika Ghosh Samaddar ◽  
A. B. Samaddar

The Border Gateway Protocol (BGP) provides crucial routing information for the Internet infrastructure. A problem with abnormal routing behavior affects the stability and connectivity of the global Internet. The biggest hurdles in detecting BGP attacks are extremely unbalanced data set category distribution and the dynamic nature of the network. This unbalanced class distribution and dynamic nature of the network results in the classifier's inferior performance. In this paper we proposed an efficient approach to properly managing these problems, the proposed approach tackles the unbalanced classification of datasets by turning the problem of binary classification into a problem of multiclass classification. This is achieved by splitting the majority-class samples evenly into multiple segments using Affinity Propagation, where the number of segments is chosen so that the number of samples in any segment closely matches the minority-class samples. Such sections of the dataset together with the minor class are then viewed as different classes and used to train the Extreme Learning Machine (ELM). The RIPE and BCNET datasets are used to evaluate the performance of the proposed technique. When no feature selection is used, the proposed technique improves the F1 score by 1.9% compared to state-of-the-art techniques. With the Fischer feature selection algorithm, the proposed algorithm achieved the highest F1 score of 76.3%, which was a 1.7% improvement over the compared ones. Additionally, the MIQ feature selection technique improves the accuracy by 3.5%. For the BCNET dataset, the proposed technique improves the F1 score by 1.8% for the Fisher feature selection technique. The experimental findings support the substantial improvement in performance from previous approaches by the new technique.

Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2099
Author(s):  
Paweł Ziemba ◽  
Jarosław Becker ◽  
Aneta Becker ◽  
Aleksandra Radomska-Zalas ◽  
Mateusz Pawluk ◽  
...  

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.


2019 ◽  
Vol 24 (2) ◽  
pp. 119-127
Author(s):  
Tariq Ali ◽  
Asif Nawaz ◽  
Hafiza Ayesha Sadia

Abstract High dimensionality is a well-known problem that has a huge number of highlights in the data, yet none is helpful for a particular data mining task undertaking, for example, classification and grouping. Therefore, selection of features is used frequently to reduce the data set dimensionality. Feature selection is a multi-target errand, which diminishes dataset dimensionality, decreases the running time, and furthermore enhances the expected precision. In the study, our goal is to diminish the quantity of features of electroencephalography data for eye state classification and achieve the same or even better classification accuracy with the least number of features. We propose a genetic algorithm-based feature selection technique with the KNN classifier. The accuracy is improved with the selected feature subset using the proposed technique as compared to the full feature set. Results prove that the classification precision of the proposed strategy is enhanced by 3 % on average when contrasted with the accuracy without feature selection.


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Wokili Abdullahi ◽  
Mary Ogbuka Kenneth ◽  
Morufu Olalere

Features in educational data are ambiguous which leads to noisy features and curse of dimensionality problems. These problems are solved via feature selection. There are existing models for features selection. These models were created using either a single-level embedded, wrapperbased or filter-based methods. However single-level filter-based methods ignore feature dependencies and ignore the interaction with the classifier. The embedded and wrapper based feature selection methods interact with the classifier, but they can only select the optimal subset for a particular classifier. So their selected features may be worse for other classifiers. Hence this research proposes a robust Cascade Bi-Level (CBL) feature selection technique for student performance prediction that will minimize the limitations of using a single-level technique. The proposed CBL feature selection technique consists of the Relief technique at first-level and the Particle Swarm Optimization (PSO) at the second-level. The proposed technique was evaluated using the UCI student performance dataset. In comparison with the performance of the single-level feature selection technique the proposed technique achieved an accuracy of 94.94% which was better than the values achieved by the single-level PSO with an accuracy of 93.67% for the binary classification task. These results show that CBL can effectively predict student performance.


Author(s):  
Hari Shanker Hota ◽  
Dinesh Sharma ◽  
Akhilesh Shrivas

Introduction: Entire world is shifting towards electronic communication through Email for fast and secure communication. Millions of people, including organization, government, and others, are using Email services. This growing number of Email users are facing problems; therefore, detecting phishing Email is a challenging task, especially for non-IT users. Automatic detection of phishing Email is essential to deploy along with Email software. Various authors have worked in the field of phishing Email classification with different feature selection and optimization technique for better performance. Objective: This paper attempts to build a model for the detection of phishing Email using data mining techniques. This paper's significant contribution is to develop and apply Feature Selection Technique (FST) to reduce features from the phishing Email benchmark data set. Methods: The proposed Pruning Based Feature Selection Technique (PBFST) is used to determine the rank of feature based on the level of the tree where feature exists. The proposed algorithm is integrated with already developed Bucket Based Feature Selection Technique (BBFST). BBFST is used as an internal part to rank features in a particular level of the tree. Results : Experimental work was done with open source WEKA data mining software using a 10-fold cross-validation technique. The proposed FST was compared with other ranking based FSTs to check the performance of C4.5 classifier with Phishing Email data set. Conclusion: The proposed FST reduces 33 features out of 47 features which exist in phishing Email data set and C4.5 algorithm produces remarkable accuracy of 99.06% with only 11 features and found to be better than other existing FST.


Author(s):  
Hua Tang ◽  
Chunmei Zhang ◽  
Rong Chen ◽  
Po Huang ◽  
Chenggang Duan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document