Feature Subset Selection for Malware Detection in Smart IoT Platforms

Malicious software (“malware”) has become one of the serious cybersecurity issues in Android ecosystem. Given the fast evolution of Android malware releases, it is practically not feasible to manually detect malware apps in the Android ecosystem. As a result, machine learning has become a fledgling approach for malware detection. Since machine learning performance is largely influenced by the availability of high quality and relevant features, feature selection approaches play key role in machine learning based detection of malware. In this paper, we formulate the feature selection problem as a quadratic programming problem and analyse how commonly used filter-based feature selection methods work with emphases on Android malware detection. We compare and contrast several feature selection methods along several factors including the composition of relevant features selected. We empirically evaluate the predictive accuracy of the feature subset selection algorithms and compare their predictive accuracy and the execution time using several learning algorithms. The results of the experiments confirm that feature selection is necessary for improving accuracy of the learning models as well decreasing the run time. The results also show that the performance of the feature selection algorithms vary from one learning algorithm to another and no one feature selection approach performs better than the other approaches all the time.

Download Full-text

Intelligent Feature Subset Selection with Machine Learning based Risk Management for DAS Prediction

10.54216/jcim.080101 ◽

2021 ◽

pp. 08-16

Author(s):

Mohamed Abdel Abdel-Basset ◽

◽

Mohamed Elhoseny

Keyword(s):

Machine Learning ◽

Risk Management ◽

Feature Selection ◽

Subset Selection ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Time Machine ◽

Primary Level ◽

Stage Process

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.

Download Full-text

SVM and KNN Based SGO Feature Selection Algorithm for Breast Cancer Diagnosis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4428.038620 ◽

2020 ◽

Vol 8 (2S7) ◽

pp. 2237-2240

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Subset Selection ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm

In diagnosis and prediction systems, algorithms working on datasets with a high number of dimensions tend to take more time than those with fewer dimensions. Feature subset selection algorithms enhance the efficiency of Machine Learning algorithms in prediction problems by selecting a subset of the total features and thus pruning redundancy and noise. In this article, such a feature subset selection method is proposed and implemented to diagnose breast cancer using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) algorithms. This feature selection algorithm is based on Social Group Optimization (SGO) an evolutionary algorithm. Higher accuracy in diagnosing breast cancer is achieved using our proposed model when compared to other feature selection-based Machine Learning algorithms

Download Full-text

Comparative Analysis of Feature Selection Methods and Machine Learning Algorithms in Permission based Android Malware Detection

2018 International Conference on Intelligent Computing and Communication for Smart World (I2C2SW) ◽

10.1109/i2c2sw45816.2018.8997527 ◽

2018 ◽

Author(s):

M. Nivaashini ◽

R. S. Soundariya ◽

H. Vidhya Shri ◽

P. Thangaraj

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Selection Methods ◽

Android Malware ◽

Android Malware Detection

Download Full-text

Optimization for Gene Selection and Cancer Classification

Proceedings ◽

10.3390/proceedings2021074021 ◽

2021 ◽

Vol 74 (1) ◽

pp. 21

Author(s):

Hülya Başeğmez ◽

Emrah Sezer ◽

Çiğdem Selçukcan Erol

Keyword(s):

Candidate Genes ◽

Cancer Diagnosis ◽

Microarray Data ◽

Gene Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Methods ◽

Diagnosis And Classification ◽

Selection Algorithms

Recently, gene selection has played an important role in cancer diagnosis and classification. In this study, it was studied to select high descriptive genes for use in cancer diagnosis in order to develop a classification analysis for cancer diagnosis using microarray data. For this purpose, comparative analysis and intersections of six different methods obtained by using two feature selection algorithms and three search algorithms are presented. As a result of the six different feature subset selection methods applied, it was seen that instead of 15,155 genes, 24 genes should be focused. In this case, cancer diagnosis may be possible using 24 candidate genes that have been reduced, rather than similar studies involving larger features. However, in order to see the diagnostic success of diagnoses made using these candidate genes, they should be examined in a wet laboratory.

Download Full-text

Symmetry based Feature Selection with Multi layer Perceptron for the prediction of Chronic Disease

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2658.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3316-3322

Keyword(s):

Machine Learning ◽

Health Care ◽

Feature Selection ◽

Chronic Disease ◽

Multilayer Perceptron ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Technique ◽

Health Care Analytics

Huge amount of Healthcare data are produced every day from the various health care sectors. The accumulated data can be effectively analyzed to identify people's risk from chronic diseases. The process of predicting the presence or absence of the disease and also to diagnosing the various disease using the historical medical data is known as Health Care Analytics. Health care analytics will improve patient care and also the harness practice of medical practitioner. The feature selection is considered as a core aspect of the machine learning which hugely contribute towards the performance of the machine learning model. In this paper symmetry based feature subset selection is proposed to select the optimal features from the Health care data which contribute towards the prediction outcome. The Multilayer perceptron algorithm(MLP) used as a classifier which will predict the outcome by using the features which are selected from the Symmetry-based feature subset selection technique. The chronic disease dataset Diabetes, Cancer, Breast Cancer, and Heart Disease data set accumulated from UCI repository is used to conduct the experiment. The experimental results demonstrate that the proposed hybrid combination of feature selection technique and the multilayer perceptron outperforms in accuracy compare to the existing approaches.

Download Full-text

Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails

ACM SIGAPP Applied Computing Review ◽

10.1145/2600617.2600622 ◽

2014 ◽

Vol 14 (1) ◽

pp. 53-61 ◽

Cited By ~ 15

Author(s):

Shrawan Kumar Trivedi ◽

Shubhamoy Dey

Keyword(s):

Machine Learning ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Machine Learning Classifiers ◽

Learning Classifiers

Download Full-text

Wrapper- and Ensemble-Based Feature Subset Selection Methods for Biomarker Discovery in Targeted Metabolomics

Pattern Recognition in Bioinformatics - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24855-9_11 ◽

2011 ◽

pp. 121-132 ◽

Cited By ~ 1

Author(s):

Holger Franken ◽

Rainer Lehmann ◽

Hans-Ulrich Häring ◽

Andreas Fritsche ◽

Norbert Stefan ◽

...

Keyword(s):

Biomarker Discovery ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Methods ◽

Targeted Metabolomics

Download Full-text

Android Malware Detection Using Genetic Algorithm based Optimized Feature Selection and Machine Learning

2019 42nd International Conference on Telecommunications and Signal Processing (TSP) ◽

10.1109/tsp.2019.8769039 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anam Fatima ◽

Ritesh Maurya ◽

Malay Kishore Dutta ◽

Radim Burget ◽

Jan Masek

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Feature Selection ◽

Malware Detection ◽

Android Malware ◽

Android Malware Detection

Download Full-text

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Advances in Intelligent Systems and Computing - 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) ◽

10.1007/978-3-030-20055-8_24 ◽

2019 ◽

pp. 251-260 ◽

Cited By ~ 1

Author(s):

Antonio J. Tallón-Ballesteros ◽

Luís Cavique ◽

Simon Fong

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Low Dimensionality ◽

Correlation Based Feature Selection

Download Full-text

On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400288 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1467-1490 ◽

Cited By ~ 7

Author(s):

Huanjing Wang ◽

Taghi M. Khoshgoftaar ◽

Naeem Seliya

Keyword(s):

Feature Selection ◽

Software Quality ◽

Software Metrics ◽

Subset Selection ◽

Feature Subset ◽

Selection Methods ◽

Statistical Measures ◽

Quality Modeling ◽

The Stability ◽

Stable Feature

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

Download Full-text