feature subset
Recently Published Documents


TOTAL DOCUMENTS

1069
(FIVE YEARS 371)

H-INDEX

46
(FIVE YEARS 8)

Author(s):  
Riyadh Rahef Nuiaa ◽  
Selvakumar Manickam ◽  
Ali Hakem Alsaeedi ◽  
Esraa Saleh Alomari

Cyberattacks have grown steadily over the last few years. The distributed reflection denial of service (DRDoS) attack has been rising, a new variant of distributed denial of service (DDoS) attack. DRDoS attacks are more difficult to mitigate due to the dynamics and the attack strategy of this type of attack. The number of features influences the performance of the intrusion detection system by investigating the behavior of traffic. Therefore, the feature selection model improves the accuracy of the detection mechanism also reduces the time of detection by reducing the number of features. The proposed model aims to detect DRDoS attacks based on the feature selection model, and this model is called a proactive feature selection model proactive feature selection (PFS). This model uses a nature-inspired optimization algorithm for the feature subset selection. Three machine learning algorithms, i.e., k-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were evaluated as the potential classifier for evaluating the selected features. We have used the CICDDoS2019 dataset for evaluation purposes. The performance of each classifier is compared to previous models. The results indicate that the suggested model works better than the current approaches providing a higher detection rate (DR), a low false-positive rate (FPR), <span>and increased accuracy detection (DA).</span> The PFS model shows better accuracy to detect DRDoS attacks with 89.59%.


2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Linyang Zhu ◽  
Weiwei Zhang ◽  
Guohua Tu

AbstractFeature selection targets for selecting relevant and useful features, and is a vital challenge in turbulence modeling by machine learning methods. In this paper, a new posterior feature selection method based on validation dataset is proposed, which is an efficient and universal method for complex systems including turbulence. Different from the priori feature importance ranking of the filter method and the exhaustive search for feature subset of the wrapper method, the proposed method ranks the features according to the model performance on the validation dataset, and generates the feature subsets in the order of feature importance. Using the features from the proposed method, a black-box model is built by artificial neural network (ANN) to reproduce the behavior of Spalart-Allmaras (S-A) turbulence model for high Reynolds number (Re) airfoil flows in aeronautical engineering. The results show that compared with the model without feature selection, the generalization ability of the model after feature selection is significantly improved. To some extent, it is also demonstrated that although the feature importance can be reflected by the model parameters during the training process, artificial feature selection is still very necessary.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Deepti Sisodia ◽  
Dilip Singh Sisodia

PurposeThe problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.Design/methodology/approachTo overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.FindingsEmpirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.Originality/valueThe FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Feature selection is performed to eliminate irrelevant features to reduce computational overheads. Metaheuristic algorithms have become popular for the task of feature selection due to their effectiveness and flexibility. Hybridization of two or more such metaheuristics has become popular in solving optimization problems. In this paper, we propose a hybrid wrapper feature selection technique based on binary butterfly optimization algorithm (bBOA) and Simulated Annealing (SA). The SA is combined with the bBOA in a pipeline fashion such that the best solution obtained by the bBOA is passed on to the SA for further improvement. The SA solution improves the best solution obtained so far by searching in its neighborhood. Thus the SA tries to enhance the exploitation property of the bBOA. The proposed method is tested on twenty datasets from the UCI repository and the results are compared with five popular algorithms for feature selection. The results confirm the effectiveness of the hybrid approach in improving the classification accuracy and selecting the optimal feature subset.


2022 ◽  
Vol 16 (1) ◽  
pp. 0-0

The number of attacks increased with speedy development in web communication in the last couple of years. The Anomaly Detection method for IDS has become substantial in detecting novel attacks in Intrusion Detection System (IDS). Achieving high accuracy are the significant challenges in designing an intrusion detection system. It also emphasizes applying different feature selection techniques to identify the most suitable feature subset. The author uses Extremely randomized trees (Extra-Tree) for feature importance. The author tries multiple thresholds on the feature importance parameters to find the best features. If single classifiers use, then the classifier's output is wrong, so that the final decision may be wrong. So The author uses an Extra-Tree classifier applied to the best-selected features. The proposed method is estimated on standard datasets KDD CUP'99, NSL-KDD, and UNSW-NB15. The experimental results show that the proposed approach performs better than existing methods in detection rate, false alarm rate, and accuracy.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Fei Guo ◽  
Zhixiang Yin ◽  
Kai Zhou ◽  
Jiasi Li

Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.


2021 ◽  
Vol 23 (12) ◽  
pp. 525-541
Author(s):  
Mrs.K. Radha ◽  
◽  
Mrs. . R.V.Sudha ◽  
Mrs.M. Meena ◽  
Dr.R. Jayavadivel ◽  
...  

With the recent advances in knowledge, the complication of multimedia has increased expressively and new areas of research have opened up in search of new multimedia content. Content-based image retrieval (CBIR) are used to extract images associated with image queries (IQs) from huge databases. The CBIR schemes accessible at present have limited functionality because they only have a partial number of functions. This document presents an improved cookie detection algorithm with coarse sentences for processing large amounts of data using selected examples. The improved cuckoo detection algorithm mimics the behavior of brood attachment parasites in some cuckoo species, including some birds. Modified cuckoo recognition uses approximate set theory to create a fitness function that takes into account the sum of features and the quality of classification as a small amount. For an image entered as IQ from a database, distance metrics are used to find the appropriate image. This is the central idea of CBIR. The projected CBIR method is labelled and can extract shape features based on the RGB color using the and canny Edge (CED) and neutrosophic clustering algorithm scheme. After YCbCrcolor cut, and the CED to get the features to extract the vascular matrix. The combination of these techniques improves the efficiency of the CBR image recovery infrastructure. In this thesis recursive neural network techniques are used to measure the similarity. In addition, the accuracy of the results is: The recall score is measured to evaluate system performance. The proposed CBIR system provides more precise and accurate values than the complex CBIR system.


2021 ◽  
Vol 12 (1) ◽  
pp. 136
Author(s):  
Ihsan Ullah ◽  
Andre Rios ◽  
Vaibhav Gala ◽  
Susan Mckeever

Trust and credibility in machine learning models are bolstered by the ability of a model to explain its decisions. While explainability of deep learning models is a well-known challenge, a further challenge is clarity of the explanation itself for relevant stakeholders of the model. Layer-wise Relevance Propagation (LRP), an established explainability technique developed for deep models in computer vision, provides intuitive human-readable heat maps of input images. We present the novel application of LRP with tabular datasets containing mixed data (categorical and numerical) using a deep neural network (1D-CNN), for Credit Card Fraud detection and Telecom Customer Churn prediction use cases. We show how LRP is more effective than traditional explainability concepts of Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) for explainability. This effectiveness is both local to a sample level and holistic over the whole testing set. We also discuss the significant computational time advantage of LRP (1–2 s) over LIME (22 s) and SHAP (108 s) on the same laptop, and thus its potential for real time application scenarios. In addition, our validation of LRP has highlighted features for enhancing model performance, thus opening up a new area of research of using XAI as an approach for feature subset selection.


Sign in / Sign up

Export Citation Format

Share Document