scholarly journals IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6336 ◽  
Author(s):  
Mnahi Alqahtani ◽  
Hassan Mathkour ◽  
Mohamed Maher Ben Ismail

Nowadays, Internet of Things (IoT) technology has various network applications and has attracted the interest of many research and industrial communities. Particularly, the number of vulnerable or unprotected IoT devices has drastically increased, along with the amount of suspicious activity, such as IoT botnet and large-scale cyber-attacks. In order to address this security issue, researchers have deployed machine and deep learning methods to detect attacks targeting compromised IoT devices. Despite these efforts, developing an efficient and effective attack detection approach for resource-constrained IoT devices remains a challenging task for the security research community. In this paper, we propose an efficient and effective IoT botnet attack detection approach. The proposed approach relies on a Fisher-score-based feature selection method along with a genetic-based extreme gradient boosting (GXGBoost) model in order to determine the most relevant features and to detect IoT botnet attacks. The Fisher score is a representative filter-based feature selection method used to determine significant features and discard irrelevant features through the minimization of intra-class distance and the maximization of inter-class distance. On the other hand, GXGBoost is an optimal and effective model, used to classify the IoT botnet attacks. Several experiments were conducted on a public botnet dataset of IoT devices. The evaluation results obtained using holdout and 10-fold cross-validation techniques showed that the proposed approach had a high detection rate using only three out of the 115 data traffic features and improved the overall performance of the IoT botnet attack detection process.

Author(s):  
*Fadare Oluwaseun Gbenga ◽  
Adetunmbi Adebayo Olusola ◽  
(Mrs) Oyinloye Oghenerukevwe Eloho ◽  
Mogaji Stephen Alaba

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Hamid Nasiri ◽  
Seyed Ali Alavi

Background and Objective. The new coronavirus disease (known as COVID-19) was first identified in Wuhan and quickly spread worldwide, wreaking havoc on the economy and people’s everyday lives. As the number of COVID-19 cases is rapidly increasing, a reliable detection technique is needed to identify affected individuals and care for them in the early stages of COVID-19 and reduce the virus’s transmission. The most accessible method for COVID-19 identification is Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR); however, it is time-consuming and has false-negative results. These limitations encouraged us to propose a novel framework based on deep learning that can aid radiologists in diagnosing COVID-19 cases from chest X-ray images. Methods. In this paper, a pretrained network, DenseNet169, was employed to extract features from X-ray images. Features were chosen by a feature selection method, i.e., analysis of variance (ANOVA), to reduce computations and time complexity while overcoming the curse of dimensionality to improve accuracy. Finally, selected features were classified by the eXtreme Gradient Boosting (XGBoost). The ChestX-ray8 dataset was employed to train and evaluate the proposed method. Results and Conclusion. The proposed method reached 98.72% accuracy for two-class classification (COVID-19, No-findings) and 92% accuracy for multiclass classification (COVID-19, No-findings, and Pneumonia). The proposed method’s precision, recall, and specificity rates on two-class classification were 99.21%, 93.33%, and 100%, respectively. Also, the proposed method achieved 94.07% precision, 88.46% recall, and 100% specificity for multiclass classification. The experimental results show that the proposed framework outperforms other methods and can be helpful for radiologists in the diagnosis of COVID-19 cases.


2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Ke Li ◽  
Sijia Zhang ◽  
Di Yan ◽  
Yannan Bin ◽  
Junfeng Xia

Abstract Background Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. Results Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. Conclusion Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.


2005 ◽  
Vol 9 (3) ◽  
pp. 237-251 ◽  
Author(s):  
Wei-Chou Chen ◽  
Ming-Chun Yang ◽  
Shian-Shyong Tseng

High dimensional data are found in the medical domain that needs to be processed for improved data analysis. In order to deal with the curse of dimensionality, feature selection process is employed in almost all data mining applications. In this research work, Density based Feature Selection (DFS) method that ranks the features by finding the Probability Density Function (PDF) of each feature is applied to medical datasets that suffer from the curse of dimensionality. The DFS method is a filter based approach that selects the most discriminatory features from the given feature set. The feature selection method evaluates the importance of the feature with regard to the target class using density function. The DFS method has major advantages over other methods, since it is based on the ranking method to select the most discriminatory features from the whole feature set. This research work finds the best feature subset that can be used in prediction and classification of medical datasets imbibed with high dimensionality. The DFS method based on PDF is applied on the three medical datasets namely Chronic Kidney Disease (CKD) dataset, Breast Cancer Wisconsin Dataset and Parkinsons Dataset. The proposed feature selection method evaluates the merit of each feature, assign weights to the feature and rank the features based on their feature density. The reduced feature subset is then validated by the application three classification algorithms namely Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neural Network (CNN). The performance of the classification algorithms are evaluated based on the performance metrics Accuracy, Sensitivity and Specificity. Experimental results indicate that the performance of the classification algorithms SVM, Gradient Boosting, and CNN is improved after the feature selection process.


Author(s):  
Mehdi Rahnama ◽  
Abolfazl Vahedi ◽  
Arta Mohammad-Alikhani ◽  
Noureddine Takorabet

Purpose On-time fault diagnosis in electrical machines is a critical issue, as it can prevent the development of fault and also reduce the repairing time and cost. In brushless synchronous generators, the significance of the fault diagnosis is even more because they are widely used to generate electrical power all around the world. Therefore, this study aims to propose a fault detection approach for the brushless synchronous generator. In this approach, a novel extension of Relief feature selection method is developed. Design/methodology/approach In this paper, by taking the advantages of the finite element method (FEM), a brushless synchronous machine is modeled to evaluate the machine performance under two conditions. These conditions include the normal condition of the machine and one diode open-circuit of the rotating rectifier. Therefore, the harmonic behavior of the terminal voltage of the machine is obtained under these situations. Then, the harmonic components are ranked by using the extension of Relief to extract the most appropriate components for fault detection. Therefore, a fault detection approach is proposed based on the ranked harmonic components and support vector machine classifier. Findings The proposed diagnosis approach is verified by using an experimental test. Results show that by this approach open-circuit fault on the diode rectifier can effectively be detected by the accuracy of 98.5% and by using five harmonic components of the terminal voltage [1]. Originality/value In this paper, a novel feature selection method is proposed to select the most effective FFT components based on an extension of Relief method, and besides, FEM modeling of a brushless synchronous generator for normal and one diode open-circuit fault.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fei Yuan ◽  
Zhandong Li ◽  
Lei Chen ◽  
Tao Zeng ◽  
Yu-Hang Zhang ◽  
...  

Cancer is one of the most threatening diseases to humans. It can invade multiple significant organs, including lung, liver, stomach, pancreas, and even brain. The identification of cancer biomarkers is one of the most significant components of cancer studies as the foundation of clinical cancer diagnosis and related drug development. During the large-scale screening for cancer prevention and early diagnosis, obtaining cancer-related tissues is impossible. Thus, the identification of cancer-associated circulating biomarkers from liquid biopsy targeting has been proposed and has become the most important direction for research on clinical cancer diagnosis. Here, we analyzed pan-cancer extracellular microRNA profiles by using multiple machine-learning models. The extracellular microRNA profiles on 11 cancer types and non-cancer were first analyzed by Boruta to extract important microRNAs. Selected microRNAs were then evaluated by the Max-Relevance and Min-Redundancy feature selection method, resulting in a feature list, which were fed into the incremental feature selection method to identify candidate circulating extracellular microRNA for cancer recognition and classification. A series of quantitative classification rules was also established for such cancer classification, thereby providing a solid research foundation for further biomarker exploration and functional analyses of tumorigenesis at the level of circulating extracellular microRNA.


Sign in / Sign up

Export Citation Format

Share Document