scholarly journals Feature Subset Selection for Malware Detection in Smart IoT Platforms

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1374
Author(s):  
Jemal Abawajy ◽  
Abdulbasit Darem ◽  
Asma A. Alhashmi

Malicious software (“malware”) has become one of the serious cybersecurity issues in Android ecosystem. Given the fast evolution of Android malware releases, it is practically not feasible to manually detect malware apps in the Android ecosystem. As a result, machine learning has become a fledgling approach for malware detection. Since machine learning performance is largely influenced by the availability of high quality and relevant features, feature selection approaches play key role in machine learning based detection of malware. In this paper, we formulate the feature selection problem as a quadratic programming problem and analyse how commonly used filter-based feature selection methods work with emphases on Android malware detection. We compare and contrast several feature selection methods along several factors including the composition of relevant features selected. We empirically evaluate the predictive accuracy of the feature subset selection algorithms and compare their predictive accuracy and the execution time using several learning algorithms. The results of the experiments confirm that feature selection is necessary for improving accuracy of the learning models as well decreasing the run time. The results also show that the performance of the feature selection algorithms vary from one learning algorithm to another and no one feature selection approach performs better than the other approaches all the time.

2021 ◽  
pp. 08-16
Author(s):  
Mohamed Abdel Abdel-Basset ◽  
◽  
◽  
Mohamed Elhoseny

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.


2020 ◽  
Vol 8 (2S7) ◽  
pp. 2237-2240

In diagnosis and prediction systems, algorithms working on datasets with a high number of dimensions tend to take more time than those with fewer dimensions. Feature subset selection algorithms enhance the efficiency of Machine Learning algorithms in prediction problems by selecting a subset of the total features and thus pruning redundancy and noise. In this article, such a feature subset selection method is proposed and implemented to diagnose breast cancer using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) algorithms. This feature selection algorithm is based on Social Group Optimization (SGO) an evolutionary algorithm. Higher accuracy in diagnosing breast cancer is achieved using our proposed model when compared to other feature selection-based Machine Learning algorithms


Proceedings ◽  
2021 ◽  
Vol 74 (1) ◽  
pp. 21
Author(s):  
Hülya Başeğmez ◽  
Emrah Sezer ◽  
Çiğdem Selçukcan Erol

Recently, gene selection has played an important role in cancer diagnosis and classification. In this study, it was studied to select high descriptive genes for use in cancer diagnosis in order to develop a classification analysis for cancer diagnosis using microarray data. For this purpose, comparative analysis and intersections of six different methods obtained by using two feature selection algorithms and three search algorithms are presented. As a result of the six different feature subset selection methods applied, it was seen that instead of 15,155 genes, 24 genes should be focused. In this case, cancer diagnosis may be possible using 24 candidate genes that have been reduced, rather than similar studies involving larger features. However, in order to see the diagnostic success of diagnoses made using these candidate genes, they should be examined in a wet laboratory.


2019 ◽  
Vol 8 (2) ◽  
pp. 3316-3322

Huge amount of Healthcare data are produced every day from the various health care sectors. The accumulated data can be effectively analyzed to identify people's risk from chronic diseases. The process of predicting the presence or absence of the disease and also to diagnosing the various disease using the historical medical data is known as Health Care Analytics. Health care analytics will improve patient care and also the harness practice of medical practitioner. The feature selection is considered as a core aspect of the machine learning which hugely contribute towards the performance of the machine learning model. In this paper symmetry based feature subset selection is proposed to select the optimal features from the Health care data which contribute towards the prediction outcome. The Multilayer perceptron algorithm(MLP) used as a classifier which will predict the outcome by using the features which are selected from the Symmetry-based feature subset selection technique. The chronic disease dataset Diabetes, Cancer, Breast Cancer, and Heart Disease data set accumulated from UCI repository is used to conduct the experiment. The experimental results demonstrate that the proposed hybrid combination of feature selection technique and the multilayer perceptron outperforms in accuracy compare to the existing approaches.


2015 ◽  
Vol 25 (09n10) ◽  
pp. 1467-1490 ◽  
Author(s):  
Huanjing Wang ◽  
Taghi M. Khoshgoftaar ◽  
Naeem Seliya

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.


Sign in / Sign up

Export Citation Format

Share Document