A study on robustness of malware detection model

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.

Download Full-text

Automatic Benchmark Generation Framework for Malware Detection

Security and Communication Networks ◽

10.1155/2018/4947695 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Guanghui Liang ◽

Jianmin Pang ◽

Zheng Shan ◽

Runqing Yang ◽

Yihang Chen

Keyword(s):

Initial Data ◽

Malware Detection ◽

Detection Methods ◽

Small Data ◽

Security Threats ◽

Improved Genetic Algorithm ◽

Data Set ◽

Detection Model ◽

Manual Selection ◽

Selection Of

To address emerging security threats, various malware detection methods have been proposed every year. Therefore, a small but representative set of malware samples are usually needed for detection model, especially for machine-learning-based malware detection models. However, current manual selection of representative samples from large unknown file collection is labor intensive and not scalable. In this paper, we firstly propose a framework that can automatically generate a small data set for malware detection. With this framework, we extract behavior features from a large initial data set and then use a hierarchical clustering technique to identify different types of malware. An improved genetic algorithm based on roulette wheel sampling is implemented to generate final test data set. The final data set is only one-eighteenth the volume of the initial data set, and evaluations show that the data set selected by the proposed framework is much smaller than the original one but does not lose nearly any semantics.

Download Full-text

AIB-SPMDM: A Smartphone Malware Detection Model Based on Artificial Immunology

Communications in Computer and Information Science - Information Computing and Applications ◽

10.1007/978-3-642-34041-3_64 ◽

2012 ◽

pp. 457-465

Author(s):

Min Zhao ◽

Tao Zhang ◽

Jinshuang Wang ◽

Zhijian Yuan

Keyword(s):

Malware Detection ◽

Detection Model ◽

Model Based ◽

Artificial Immunology

Download Full-text

URefFlow: A Unified Android Malware Detection Model Based on Reflective Calls

2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2018.8711111 ◽

2018 ◽

Author(s):

Chao Liu ◽

Jianan Li ◽

Min Yu ◽

Gang Li ◽

Bo Luo ◽

...

Keyword(s):

Malware Detection ◽

Android Malware ◽

Detection Model ◽

Model Based ◽

Android Malware Detection

Download Full-text

A malware detection model based on a negative selection algorithm with penalty factor

Science China Information Sciences ◽

10.1007/s11432-010-4123-5 ◽

2010 ◽

Vol 53 (12) ◽

pp. 2461-2471 ◽

Cited By ~ 19

Author(s):

PengTao Zhang ◽

Wei Wang ◽

Ying Tan

Keyword(s):

Negative Selection ◽

Malware Detection ◽

Selection Algorithm ◽

Negative Selection Algorithm ◽

Detection Model ◽

Model Based ◽

Penalty Factor

Download Full-text

Heterogeneous Graph Convolutional Networks for Android Malware Detection using Callback-Aware Caller-Callee Graphs

10.36227/techrxiv.15072087 ◽

2021 ◽

Author(s):

Vinayaka K V ◽

Jaidhar C D

Keyword(s):

Malware Detection ◽

Application Programming Interface ◽

Extraction Methods ◽

Android Application ◽

Convolutional Network ◽

Android Malware ◽

Detection Model ◽

Convolutional Networks ◽

Android Malware Detection ◽

Ablation Study

<pre> The popularity of the Android Operating System in the smartphone market has given rise to lots of Android malware. To accurately detect these malware, many of the existing works use machine learning and deep learning-based methods, in which feature extraction methods were used to extract fixed-size feature vectors using the files present inside the Android Application Package (APK). Recently, Graph Convolutional Network (GCN) based methods applied on the Function Call Graph (FCG) extracted from the APK are gaining momentum in Android malware detection, as GCNs are effective at learning tasks on variable-sized graphs such as FCG, and FCG sufficiently captures the structure and behaviour of an APK. However, the FCG lacks information about callback methods as the Android Application Programming Interface (API) is event-driven. This paper proposes enhancing the FCG to eFCG (enhanced-FCG) using the callback information extracted using Android Framework Space Analysis to overcome this limitation. Further, we add permission - API method relationships to the eFCG. The eFCG is reduced using node contraction based on the classes to get R-eFCG (Reduced eFCG) to improve the generalisation ability of the Android malware detection model. The eFCG and R-eFCG are then given as the inputs to the Heterogeneous GCN models to determine whether the APK file from which they are extracted is malicious or not. To test the effectiveness of eFCG and R-eFCG, we conducted an ablation study by removing their various components. To determine the optimal neighbourhood size for GCN, we experimented with a varying number of GCN layers and found that the Android malware detection model using R-eFCG with all its components with four convolution layers achieved maximum accuracy of 96.28%.</pre>

Download Full-text

Heterogeneous Graph Convolutional Networks for Android Malware Detection using Callback-Aware Caller-Callee Graphs

10.36227/techrxiv.15072087.v1 ◽

2021 ◽

Author(s):

Vinayaka K V ◽

Jaidhar C D

Keyword(s):

Malware Detection ◽

Application Programming Interface ◽

Extraction Methods ◽

Android Application ◽

Convolutional Network ◽

Android Malware ◽

Detection Model ◽

Convolutional Networks ◽

Android Malware Detection ◽

Ablation Study

<pre> The popularity of the Android Operating System in the smartphone market has given rise to lots of Android malware. To accurately detect these malware, many of the existing works use machine learning and deep learning-based methods, in which feature extraction methods were used to extract fixed-size feature vectors using the files present inside the Android Application Package (APK). Recently, Graph Convolutional Network (GCN) based methods applied on the Function Call Graph (FCG) extracted from the APK are gaining momentum in Android malware detection, as GCNs are effective at learning tasks on variable-sized graphs such as FCG, and FCG sufficiently captures the structure and behaviour of an APK. However, the FCG lacks information about callback methods as the Android Application Programming Interface (API) is event-driven. This paper proposes enhancing the FCG to eFCG (enhanced-FCG) using the callback information extracted using Android Framework Space Analysis to overcome this limitation. Further, we add permission - API method relationships to the eFCG. The eFCG is reduced using node contraction based on the classes to get R-eFCG (Reduced eFCG) to improve the generalisation ability of the Android malware detection model. The eFCG and R-eFCG are then given as the inputs to the Heterogeneous GCN models to determine whether the APK file from which they are extracted is malicious or not. To test the effectiveness of eFCG and R-eFCG, we conducted an ablation study by removing their various components. To determine the optimal neighbourhood size for GCN, we experimented with a varying number of GCN layers and found that the Android malware detection model using R-eFCG with all its components with four convolution layers achieved maximum accuracy of 96.28%.</pre>

Download Full-text