IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection

Nowadays, Internet of Things (IoT) technology has various network applications and has attracted the interest of many research and industrial communities. Particularly, the number of vulnerable or unprotected IoT devices has drastically increased, along with the amount of suspicious activity, such as IoT botnet and large-scale cyber-attacks. In order to address this security issue, researchers have deployed machine and deep learning methods to detect attacks targeting compromised IoT devices. Despite these efforts, developing an efficient and effective attack detection approach for resource-constrained IoT devices remains a challenging task for the security research community. In this paper, we propose an efficient and effective IoT botnet attack detection approach. The proposed approach relies on a Fisher-score-based feature selection method along with a genetic-based extreme gradient boosting (GXGBoost) model in order to determine the most relevant features and to detect IoT botnet attacks. The Fisher score is a representative filter-based feature selection method used to determine significant features and discard irrelevant features through the minimization of intra-class distance and the maximization of inter-class distance. On the other hand, GXGBoost is an optimal and effective model, used to classify the IoT botnet attacks. Several experiments were conducted on a public botnet dataset of IoT devices. The evaluation results obtained using holdout and 10-fold cross-validation techniques showed that the proposed approach had a high detection rate using only three out of the 115 data traffic features and improved the overall performance of the IoT botnet attack detection process.

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text

A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images

Computational Intelligence and Neuroscience ◽

10.1155/2022/4694567 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Hamid Nasiri ◽

Seyed Ali Alavi

Keyword(s):

Feature Selection ◽

Deep Learning ◽

False Negative ◽

Feature Selection Method ◽

Multiclass Classification ◽

Selection Method ◽

Gradient Boosting ◽

X Ray ◽

Extreme Gradient Boosting ◽

Chest X Ray

Background and Objective. The new coronavirus disease (known as COVID-19) was first identified in Wuhan and quickly spread worldwide, wreaking havoc on the economy and people’s everyday lives. As the number of COVID-19 cases is rapidly increasing, a reliable detection technique is needed to identify affected individuals and care for them in the early stages of COVID-19 and reduce the virus’s transmission. The most accessible method for COVID-19 identification is Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR); however, it is time-consuming and has false-negative results. These limitations encouraged us to propose a novel framework based on deep learning that can aid radiologists in diagnosing COVID-19 cases from chest X-ray images. Methods. In this paper, a pretrained network, DenseNet169, was employed to extract features from X-ray images. Features were chosen by a feature selection method, i.e., analysis of variance (ANOVA), to reduce computations and time complexity while overcoming the curse of dimensionality to improve accuracy. Finally, selected features were classified by the eXtreme Gradient Boosting (XGBoost). The ChestX-ray8 dataset was employed to train and evaluate the proposed method. Results and Conclusion. The proposed method reached 98.72% accuracy for two-class classification (COVID-19, No-findings) and 92% accuracy for multiclass classification (COVID-19, No-findings, and Pneumonia). The proposed method’s precision, recall, and specificity rates on two-class classification were 99.21%, 93.33%, and 100%, respectively. Also, the proposed method achieved 94.07% precision, 88.46% recall, and 100% specificity for multiclass classification. The experimental results show that the proposed framework outperforms other methods and can be helpful for radiologists in the diagnosis of COVID-19 cases.

Download Full-text

Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting

BMC Bioinformatics ◽

10.1186/s12859-020-03683-3 ◽

2020 ◽

Vol 21 (S13) ◽

Cited By ~ 2

Author(s):

Ke Li ◽

Sijia Zhang ◽

Di Yan ◽

Yannan Bin ◽

Junfeng Xia

Keyword(s):

Feature Selection ◽

Manifold Learning ◽

Hot Spots ◽

Large Scale ◽

Computational Method ◽

Gradient Boosting ◽

Feature Mapping ◽

Accessible Information ◽

Extreme Gradient Boosting ◽

Isometric Feature Mapping

Abstract Background Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. Results Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. Conclusion Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.

Download Full-text

A novel feature selection method for large-scale data sets1

Intelligent Data Analysis ◽

10.3233/ida-2005-9302 ◽

2005 ◽

Vol 9 (3) ◽

pp. 237-251 ◽

Cited By ~ 1

Author(s):

Wei-Chou Chen ◽

Ming-Chun Yang ◽

Shian-Shyong Tseng

Keyword(s):

Feature Selection ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Information ◽

10.3390/info7010006 ◽

2016 ◽

Vol 7 (1) ◽

pp. 6 ◽

Cited By ~ 15

Author(s):

Yong Wang ◽

Wenlong Ke ◽

Xiaoling Tao

Keyword(s):

Feature Selection ◽

Network Traffic ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Traffic Classification ◽

Large Scale Network ◽

Network Traffic Classification ◽

Scale Network

Download Full-text

Density Based Feature Selection Method for Medical Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3875.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 4370-4374

Keyword(s):

Feature Selection ◽

Density Function ◽

Selection Process ◽

Research Work ◽

Feature Selection Method ◽

Curse Of Dimensionality ◽

Selection Method ◽

Gradient Boosting ◽

Classification Algorithms ◽

Feature Subset

High dimensional data are found in the medical domain that needs to be processed for improved data analysis. In order to deal with the curse of dimensionality, feature selection process is employed in almost all data mining applications. In this research work, Density based Feature Selection (DFS) method that ranks the features by finding the Probability Density Function (PDF) of each feature is applied to medical datasets that suffer from the curse of dimensionality. The DFS method is a filter based approach that selects the most discriminatory features from the given feature set. The feature selection method evaluates the importance of the feature with regard to the target class using density function. The DFS method has major advantages over other methods, since it is based on the ranking method to select the most discriminatory features from the whole feature set. This research work finds the best feature subset that can be used in prediction and classification of medical datasets imbibed with high dimensionality. The DFS method based on PDF is applied on the three medical datasets namely Chronic Kidney Disease (CKD) dataset, Breast Cancer Wisconsin Dataset and Parkinsons Dataset. The proposed feature selection method evaluates the merit of each feature, assign weights to the feature and rank the features based on their feature density. The reduced feature subset is then validated by the application three classification algorithms namely Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neural Network (CNN). The performance of the classification algorithms are evaluated based on the performance metrics Accuracy, Sensitivity and Specificity. Experimental results indicate that the performance of the classification algorithms SVM, Gradient Boosting, and CNN is improved after the feature selection process.

Download Full-text

Optimization of feature selection method for high dimensional data using fisher score and minimum spanning tree

2014 Annual IEEE India Conference (INDICON) ◽

10.1109/indicon.2014.7030450 ◽

2014 ◽

Cited By ~ 9

Author(s):

Bharat Singh ◽

Jitendra Singh Sankhwar ◽

Om Prakash Vyas

Keyword(s):

Feature Selection ◽

Spanning Tree ◽

Minimum Spanning Tree ◽

High Dimensional Data ◽

Feature Selection Method ◽

Selection Method ◽

High Dimensional ◽

Fisher Score

Download Full-text

Diagnosis of brushless synchronous generator using numerical modeling

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-01-2020-0018 ◽

2020 ◽

Vol 39 (5) ◽

pp. 1241-1254

Author(s):

Mehdi Rahnama ◽

Abolfazl Vahedi ◽

Arta Mohammad-Alikhani ◽

Noureddine Takorabet

Keyword(s):

Feature Selection ◽

Fault Detection ◽

Feature Selection Method ◽

Synchronous Generator ◽

Selection Method ◽

Open Circuit ◽

Content Type ◽

Detection Approach ◽

Terminal Voltage ◽

Harmonic Components

Purpose On-time fault diagnosis in electrical machines is a critical issue, as it can prevent the development of fault and also reduce the repairing time and cost. In brushless synchronous generators, the significance of the fault diagnosis is even more because they are widely used to generate electrical power all around the world. Therefore, this study aims to propose a fault detection approach for the brushless synchronous generator. In this approach, a novel extension of Relief feature selection method is developed. Design/methodology/approach In this paper, by taking the advantages of the finite element method (FEM), a brushless synchronous machine is modeled to evaluate the machine performance under two conditions. These conditions include the normal condition of the machine and one diode open-circuit of the rotating rectifier. Therefore, the harmonic behavior of the terminal voltage of the machine is obtained under these situations. Then, the harmonic components are ranked by using the extension of Relief to extract the most appropriate components for fault detection. Therefore, a fault detection approach is proposed based on the ranked harmonic components and support vector machine classifier. Findings The proposed diagnosis approach is verified by using an experimental test. Results show that by this approach open-circuit fault on the diode rectifier can effectively be detected by the accuracy of 98.5% and by using five harmonic components of the terminal voltage [1]. Originality/value In this paper, a novel feature selection method is proposed to select the most effective FFT components based on an extension of Relief method, and besides, FEM modeling of a brushless synchronous generator for normal and one diode open-circuit fault.

Download Full-text

Seasonal-adjustment Based Feature Selection Method for Predicting Epidemic with Large-scale Search Engine Logs

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD '19 ◽

10.1145/3292500.3330766 ◽

2019 ◽

Author(s):

Thien Q. Tran ◽

Jun Sakuma

Keyword(s):

Feature Selection ◽

Search Engine ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Seasonal Adjustment

Download Full-text

Identifying the Signatures and Rules of Circulating Extracellular MicroRNA for Distinguishing Cancer Subtypes

Frontiers in Genetics ◽

10.3389/fgene.2021.651610 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fei Yuan ◽

Zhandong Li ◽

Lei Chen ◽

Tao Zeng ◽

Yu-Hang Zhang ◽

...

Keyword(s):

Feature Selection ◽

Cancer Diagnosis ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Quantitative Classification ◽

Cancer Subtypes ◽

Clinical Cancer ◽

Cancer Types ◽

Functional Analyses

Cancer is one of the most threatening diseases to humans. It can invade multiple significant organs, including lung, liver, stomach, pancreas, and even brain. The identification of cancer biomarkers is one of the most significant components of cancer studies as the foundation of clinical cancer diagnosis and related drug development. During the large-scale screening for cancer prevention and early diagnosis, obtaining cancer-related tissues is impossible. Thus, the identification of cancer-associated circulating biomarkers from liquid biopsy targeting has been proposed and has become the most important direction for research on clinical cancer diagnosis. Here, we analyzed pan-cancer extracellular microRNA profiles by using multiple machine-learning models. The extracellular microRNA profiles on 11 cancer types and non-cancer were first analyzed by Boruta to extract important microRNAs. Selected microRNAs were then evaluated by the Max-Relevance and Min-Redundancy feature selection method, resulting in a feature list, which were fed into the incremental feature selection method to identify candidate circulating extracellular microRNA for cancer recognition and classification. A series of quantitative classification rules was also established for such cancer classification, thereby providing a solid research foundation for further biomarker exploration and functional analyses of tumorigenesis at the level of circulating extracellular microRNA.

Download Full-text