Byte2vec: Malware Representation and Feature Selection for Android

Abstract Malware detection based on static features and without code disassembling is a challenging path of research. Obfuscation makes the static analysis of malware even more challenging. This paper extends static malware detection beyond byte level $n$-grams and detecting important strings. We propose a model (Byte2vec) with the capabilities of both binary file feature representation and feature selection for malware detection. Byte2vec embeds the semantic similarity of byte level codes into a feature vector (byte vector) and also into a context vector. The learned feature vectors of Byte2vec, using skip-gram with negative-sampling topology, are combined with byte-level term-frequency (tf) for malware detection. We also show that the distance between a feature vector and its corresponding context vector provides a useful measure to rank features. The top ranked features are successfully used for malware detection. We show that this feature selection algorithm is an unsupervised version of mutual information (MI). We test the proposed scheme on four freely available Android malware datasets including one obfuscated malware dataset. The model is trained only on clean APKs. The results show that the model outperforms MI in a low-dimensional feature space and is competitive with MI and other state-of-the-art models in higher dimensions. In particular, our tests show very promising results on a wide range of obfuscated malware with a false negative rate of only 0.3% and a false positive rate of 2.0%. The detection results on obfuscated malware show the advantage of the unsupervised feature selection algorithm compared with the MI-based method.

Download Full-text

Effective Feature Selection for 5G IM Applications Traffic Classification

Mobile Information Systems ◽

10.1155/2017/6805056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Muhammad Shafiq ◽

Xiangzhan Yu ◽

Asif Ali Laghari ◽

Dawei Wang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Statistical Test ◽

Traffic Classification ◽

Features Selection ◽

Traffic Flows ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Wrapper Method ◽

Selection For

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

Download Full-text

An Improved Feature Selection Algorithm Based on Parzen Window and Conditional Mutual Information

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2614 ◽

2013 ◽

Vol 347-350 ◽

pp. 2614-2619

Author(s):

Deng Chao He ◽

Wen Ning Hao ◽

Gang Chen ◽

Da Wei Jin

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Probability Density Functions ◽

Density Functions ◽

Continuous Variables ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Parzen Window ◽

Selection For

In this paper, an improved feature selection algorithm by conditional mutual information with Parzen window was proposed, which adopted conditional mutual information as an evaluation criterion of feature selection in order to overcome the deficiency of feature redundant and used Parzen window to estimate the probability density functions and calculate the conditional mutual information of continuous variables, in such a way as to achieve feature selection for continuous data.

Download Full-text

AN EVALUATION OF PARALLEL STRATEGIES FOR FEATURE VECTOR CONSTRUCTION IN AUTOMATIC SIGNATURE VERIFICATION SYSTEMS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001494000358 ◽

1994 ◽

Vol 08 (03) ◽

pp. 661-678 ◽

Cited By ~ 7

Author(s):

M.C. FAIRHURST ◽

P. BRITTAN

Keyword(s):

Feature Selection ◽

Feature Vector ◽

General Purpose ◽

Signature Verification ◽

Generic Model ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Vector Construction ◽

Verification Systems ◽

Handwritten Signature

This paper describes possible strategies for the implementation of a feature selection algorithm particularly suited to the realisation of an efficient automatic handwritten signature verification system in which an active feature vector, optimised with respect to an individual signer, is constructed during an enrollment period. A range of configurations based on transputer arrays are considered and the possible implementational approaches evaluated. The paper demonstrates how the inherent parallelism which exists within a generic model for verification can be exploited to provide an optimised general-purpose framework for verification processing.

Download Full-text

Research and implementation of Chinese text feature selection algorithm based on χ2statistics

Computational Intelligence and Industrial Engineering ◽

10.2495/ciie140191 ◽

2014 ◽

Author(s):

Weijiang Wu ◽

Shengkai Wen ◽

Dongmei Xia ◽

Guohe Li

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Text Feature

Download Full-text

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437835 ◽

2020 ◽

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Hybrid Feature Selection Algorithm Based on Discrete Artificial Bee Colony for Parkinson Diagnosis

ACM Transactions on Internet Technology ◽

10.1145/3397161 ◽

2020 ◽

Cited By ~ 1

Author(s):

Haolun Li ◽

Chi-Man Pun ◽

Feng Xu ◽

Longsheng Pan ◽

Rui Zong ◽

...

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bee Colony

Download Full-text

High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm

Energies ◽

10.3390/en14051238 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1238

Author(s):

Supanat Chamchuen ◽

Apirat Siritaratiwat ◽

Pradit Fuangfoo ◽

Puripong Suthisopapan ◽

Pirat Khunkitti

Keyword(s):

Feature Selection ◽

Power Quality ◽

Distribution System ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Electrical Distribution ◽

Power Quality Disturbance ◽

Optimal Feature Selection ◽

Optimal Feature

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.

Download Full-text