A New Feature Selection Method Based on a Self-Variant Genetic Algorithm Applied to Android Malware Detection

In solving classification problems in the field of machine learning and pattern recognition, the pre-processing of data is particularly important. The processing of high-dimensional feature datasets increases the time and space complexity of computer processing and reduces the accuracy of classification models. Hence, the proposal of a good feature selection method is essential. This paper presents a new algorithm for solving feature selection, retaining the selection and mutation operators from traditional genetic algorithms. On the one hand, the global search capability of the algorithm is ensured by changing the population size, on the other hand, finding the optimal mutation probability for solving the feature selection problem based on different population sizes. During the iteration of the algorithm, the population size does not change, no matter how many transformations are made, and is the same as the initialized population size; this spatial invariance is physically defined as symmetry. The proposed method is compared with other algorithms and validated on different datasets. The experimental results show good performance of the algorithm, in addition to which we apply the algorithm to a practical Android software classification problem and the results also show the superiority of the algorithm.

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

A Grey Wolf Optimizer Feature Selection method and its Effect on the Performance of Document Classification Problem

Journal Port Science Research ◽

10.36371/port.2021.2.9 ◽

2021 ◽

Vol 4 (2) ◽

pp. 116-122

Author(s):

Ibraheem Al-Jadir ◽

Waleed A. Mahmoud

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Optimization Methods ◽

Classification Problem ◽

Performance Outcomes ◽

Grey Wolf Optimizer ◽

Great Success ◽

Grey Wolf ◽

Feature Selection Problem ◽

Krill Herd

Optimization methods are considered as one of the highly developed areas in Artificial Intelligence (AI). The success of the Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) has encouraged researchers to develop other methods that can obtain better performance outcomes and to be more responding to the modern needs. The Grey Wolf Optimization (GWO), and the Krill Herd (KH) are some of those methods that showed a great success in different applications in the last few years. In this paper, we propose a comparative study of using different optimization methods including KH and GWO in order to solve the problem of document feature selection for the classification problem. These methods are used to model the feature selection problem as a typical optimization method. Due to the complexity and the non-linearity of this kind of problems, it becomes necessary to use some advanced techniques to make the judgement of which features subset that is optimal to enhance the performance of classification of text documents. The test results showed the superiority of GWO over the other counterparts using the specified evaluation measures.

Download Full-text

Implementasi teknik seleksi fitur pada klasifikasi malware Android menggunakan support vector machine (SVM)

Repositor ◽

10.22219/repositor.v1i1.1 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Hendra Saputra ◽

Setio Basuki ◽

Mahar Faiqurahman

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Chi Square ◽

Android Malware ◽

Correlation Based Feature Selection ◽

Selection Of

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver. To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one

Download Full-text

Android Malware Detection Using Machine Learning with Feature Selection Based on the Genetic Algorithm

Mathematics ◽

10.3390/math9212813 ◽

2021 ◽

Vol 9 (21) ◽

pp. 2813

Author(s):

Jaehyeong Lee ◽

Hyuk Jang ◽

Sungmin Ha ◽

Yourim Yoon

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Android Malware ◽

Detection Techniques ◽

Android Malware Detection

Since the discovery that machine learning can be used to effectively detect Android malware, many studies on machine learning-based malware detection techniques have been conducted. Several methods based on feature selection, particularly genetic algorithms, have been proposed to increase the performance and reduce costs. However, because they have yet to be compared with other methods and their many features have not been sufficiently verified, such methods have certain limitations. This study investigates whether genetic algorithm-based feature selection helps Android malware detection. We applied nine machine learning algorithms with genetic algorithm-based feature selection for 1104 static features through 5000 benign applications and 2500 malwares included in the Andro-AutoPsy dataset. Comparative experimental results show that the genetic algorithm performed better than the information gain-based method, which is generally used as a feature selection method. Moreover, machine learning using the proposed genetic algorithm-based feature selection has an absolute advantage in terms of time compared to machine learning without feature selection. The results indicate that incorporating genetic algorithms into Android malware detection is a valuable approach. Furthermore, to improve malware detection performance, it is useful to apply genetic algorithm-based feature selection to machine learning.

Download Full-text

Binary Differential Evolution based Feature Selection Method with Mutual Information for Imbalanced Classification Problems

2021 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec45853.2021.9504882 ◽

2021 ◽

Author(s):

Arka Ghosh ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Differential Evolution ◽

Feature Selection Method ◽

Selection Method ◽

Classification Problems ◽

Imbalanced Classification ◽

Binary Differential Evolution

Download Full-text

A New Feature Selection Method for One-Class Classification Problems

IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) ◽

10.1109/tsmcc.2012.2196794 ◽

2012 ◽

Vol 42 (6) ◽

pp. 1500-1509 ◽

Cited By ~ 19

Author(s):

Young-Seon Jeong ◽

In-Ho Kang ◽

Myong-Kee Jeong ◽

Dongjoon Kong

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Classification Problems ◽

New Feature ◽

One Class Classification

Download Full-text

Heart Disease Prediction Based on an Optimal Feature Selection Method using Autoencoder

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst20748 ◽

2020 ◽

pp. 25-38

Author(s):

Azhar M. A. ◽

Princy Ann Thomas

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Feature Selection Method ◽

Classification Problem ◽

Machine Learning Techniques ◽

Classification Problems ◽

Process Data ◽

Integration Algorithm ◽

Learning Techniques ◽

Hybrid Classification

Heart Failure is one of the common diseases that can lead to dangerous situations. There are several data available within the healthcare systems. However, there was an absence of successful analysis methods to find connections and patterns in health care data. Some Machine learning methods can help us remedy this circumstance. This helps in getting a better insight into the concept of a classification problem. In many classification problems, it is difficult to learn good classifiers before removing these unwanted features due to the huge size of the data. In my work, we have used an artificial neural network-based autoencoder for effective feature selection The aim of feature selection is improving prediction performance and providing a better understanding of the process data. Hybrid Classification method with a dynamic integration algorithm for classification that aims at finding optimal features by applying machine learning techniques resulting in improving the performance in the prediction of cardiovascular disease.

Download Full-text

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

Advances on Smart and Soft Computing - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-6048-4_15 ◽

2020 ◽

pp. 167-176

Author(s):

Mohammad Aizat Basir ◽

Mohamed Saifullah Hussin ◽

Yuhanis Yusof

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Classification Problem ◽

Selection Method ◽

Multi Objective ◽

Objective Classification

Download Full-text

A graph-based feature selection method for learning to rank using spectral clustering for redundancy minimization and biased PageRank for relevance analysis

Computer Science and Information Systems ◽

10.2298/csis201220042y ◽

2021 ◽

pp. 42-42

Author(s):

Jen-Yuan Yeh ◽

Cheng-Jung Tsai

Keyword(s):

Feature Selection ◽

Spectral Clustering ◽

Learning To Rank ◽

Feature Selection Method ◽

Selection Method ◽

Feature Selection Problem ◽

Ranking Problem ◽

Relevance Score ◽

Ranking Svm ◽

Benchmark Datasets

This paper addresses the feature selection problem in learning to rank (LTR). We propose a graph-based feature selection method, named FS-SCPR, which comprises four steps: (i) use ranking information to assess the similarity between features and construct an undirected feature similarity graph; (ii) apply spectral clustering to cluster features using eigenvectors of matrices extracted from the graph; (iii) utilize biased PageRank to assign a relevance score with respect to the ranking problem to each feature by incorporating each feature?s ranking performance as preference to bias the PageRank computation; and (iv) apply optimization to select the feature from each cluster with both the highest relevance score and most information of the features in the cluster. We also develop a new LTR for information retrieval (IR) approach that first exploits FS-SCPR as a preprocessor to determine discriminative and useful features and then employs Ranking SVM to derive a ranking model with the selected features. An evaluation, conducted using the LETOR benchmark datasets, demonstrated the competitive performance of our approach compared to representative feature selection methods and state-of-the-art LTR methods.

Download Full-text

A static analysis approach for Android permission-based malware detection systems

PLoS ONE ◽

10.1371/journal.pone.0257968 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257968

Author(s):

Juliza Mohamad Arif ◽

Mohd Faizal Ab Razak ◽

Suryanti Awang ◽

Sharfah Ratibah Tuan Mat ◽

Nor Syahidatul Nadiah Ismail ◽

...

Keyword(s):

Static Analysis ◽

Malware Detection ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Algorithm ◽

Android Malware ◽

Detection Systems ◽

Android Malware Detection ◽

Security Evaluations ◽

Android Permission

The evolution of malware is causing mobile devices to crash with increasing frequency. Therefore, adequate security evaluations that detect Android malware are crucial. Two techniques can be used in this regard: Static analysis, which meticulously examines the full codes of applications, and dynamic analysis, which monitors malware behaviour. While both perform security evaluations successfully, there is still room for improvement. The goal of this research is to examine the effectiveness of static analysis to detect Android malware by using permission-based features. This study proposes machine learning with different sets of classifiers was used to evaluate Android malware detection. The feature selection method in this study was applied to determine which features were most capable of distinguishing malware. A total of 5,000 Drebin malware samples and 5,000 Androzoo benign samples were utilised. The performances of the different sets of classifiers were then compared. The results indicated that with a TPR value of 91.6%, the Random Forest algorithm achieved the highest level of accuracy in malware detection.

Download Full-text