scholarly journals Automated Malware Detection in Mobile App Stores Based on Robust Feature Generation

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 435 ◽  
Author(s):  
Moutaz Alazab

Many Internet of Things (IoT) services are currently tracked and regulated via mobile devices, making them vulnerable to privacy attacks and exploitation by various malicious applications. Current solutions are unable to keep pace with the rapid growth of malware and are limited by low detection accuracy, long discovery time, complex implementation, and high computational costs associated with the processor speed, power, and memory. Therefore, an automated intelligence technique is necessary for detecting apps containing malware and effectively predicting cyberattacks in mobile marketplaces. In this study, a system for classifying mobile marketplaces applications using real-world datasets is proposed, which analyzes the source code to identify malicious apps. A rich feature set of application programming interface (API) calls is proposed to capture the regularities in apps containing malicious content. Two feature-selection methods—Chi-Square and ANOVA—were examined in conjunction with ten supervised machine-learning algorithms. The detection accuracy of each classifier was evaluated to identify the most reliable classifier for malware detection using various feature sets. Chi-Square was found to have a higher detection accuracy as compared to ANOVA. The proposed system achieved a detection accuracy of 98.1% with a classification time of 1.22 s. Furthermore, the proposed system required a reduced number of API calls (500 instead of 9000) to be incorporated as features.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yubo Song ◽  
Yijin Geng ◽  
Junbo Wang ◽  
Shang Gao ◽  
Wei Shi

Since a growing number of malicious applications attempt to steal users’ private data by illegally invoking permissions, application stores have carried out many malware detection methods based on application permissions. However, most of them ignore specific permission combinations and application categories that affect the detection accuracy. The features they extracted are neither representative enough to distinguish benign and malicious applications. For these problems, an Android malware detection method based on permission sensitivity is proposed. First, for each kind of application categories, the permission features and permission combination features are extracted. The sensitive permission feature set corresponding to each category label is then obtained by the feature selection method based on permission sensitivity. In the following step, the permission call situation of the application to be detected is compared with the sensitive permission feature set, and the weight allocation method is used to quantify this information into numerical features. In the proposed method of malicious application detection, three machine-learning algorithms are selected to construct the classifier model and optimize the parameters. Compared with traditional methods, the proposed method consumed 60.94% less time while still achieving high accuracy of up to 92.17%.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1777
Author(s):  
Muhammad Ali ◽  
Stavros Shiaeles ◽  
Gueltoum Bendiab ◽  
Bogdan Ghita

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.


Android malware have risen exponentially over the past few years, posing several serious threats such as system damage, financial loss, and mobile botnets. Various detection techniques have been proposed in the literature for Android malware detection. Some of the techniques analyze static parameters such as permissions, or intents, whereas, others focus on dynamic parameters such as network traffic or system calls. Static techniques are relatively easier to implement, however, stealthy recent malware evade static detection by virtue of update attacks. Dynamic detection can be used to detect such stealthy malware, however, it increases the computation overhead. Hence, both kinds of techniques have their own advantages and disadvantages. In this paper, we have proposed an innovative hybrid detection model that uses both static and dynamic features for malware analysis and detection. We first rank the static and dynamic parameters according to the information gain and then apply machine learning algorithms in the testing phase. The results indicate that hybrid approach is better than both static and dynamic approaches and the proposed model achieves 98.9% detection accuracy with Decision Tree classifier


2019 ◽  
Vol 9 (2) ◽  
pp. 239 ◽  
Author(s):  
Bruce Ndibanje ◽  
Ki Kim ◽  
Young Kang ◽  
Hyun Kim ◽  
Tae Kim ◽  
...  

Data-driven public security networking and computer systems are always under threat from malicious codes known as malware; therefore, a large amount of research and development is taking place to find effective countermeasures. These countermeasures are mainly based on dynamic and statistical analysis. Because of the obfuscation techniques used by the malware authors, security researchers and the anti-virus industry are facing a colossal issue regarding the extraction of hidden payloads within packed executable extraction. Based on this understanding, we first propose a method to de-obfuscate and unpack the malware samples. Additional, cross-method-based big data analysis to dynamically and statistically extract features from malware has been proposed. The Application Programming Interface (API) call sequences that reflect the malware behavior of its code have been used to detect behavior such as network traffic, modifying a file, writing to stderr or stdout, modifying a registry value, creating a process. Furthermore, we include a similarity analysis and machine learning algorithms to profile and classify malware behaviors. The experimental results of the proposed method show that malware detection accuracy is very useful to discover potential threats and can help the decision-maker to deploy appropriate countermeasures.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Tianliang Lu ◽  
Yanhui Du ◽  
Li Ouyang ◽  
Qiuyu Chen ◽  
Xirui Wang

In recent years, the number of malware on the Android platform has been increasing, and with the widespread use of code obfuscation technology, the accuracy of antivirus software and traditional detection algorithms is low. Current state-of-the-art research shows that researchers started applying deep learning methods for malware detection. We proposed an Android malware detection algorithm based on a hybrid deep learning model which combines deep belief network (DBN) and gate recurrent unit (GRU). First of all, analyze the Android malware; in addition to extracting static features, dynamic behavioral features with strong antiobfuscation ability are also extracted. Then, build a hybrid deep learning model for Android malware detection. Because the static features are relatively independent, the DBN is used to process the static features. Because the dynamic features have temporal correlation, the GRU is used to process the dynamic feature sequence. Finally, the training results of DBN and GRU are input into the BP neural network, and the final classification results are output. Experimental results show that, compared with the traditional machine learning algorithms, the Android malware detection model based on hybrid deep learning algorithms has a higher detection accuracy, and it also has a better detection effect on obfuscated malware.


2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
TaeGuen Kim ◽  
BooJoong Kang ◽  
Eul Gyu Im

As the number of Android malware has been increased rapidly over the years, various malware detection methods have been proposed so far. Existing methods can be classified into two categories: static analysis-based methods and dynamic analysis-based methods. Both approaches have some limitations: static analysis-based methods are relatively easy to be avoided through transformation techniques such as junk instruction insertions, code reordering, and so on. However, dynamic analysis-based methods also have some limitations that analysis overheads are relatively high and kernel modification might be required to extract dynamic features. In this paper, we propose a dynamic analysis framework for Android malware detection that overcomes the aforementioned shortcomings. The framework uses a suffix tree that contains API (Application Programming Interface) subtraces and their probabilistic confidence values that are generated using HMMs (Hidden Markov Model) to reduce the malware detection overhead, and we designed the framework with the client-server architecture since the suffix tree is infeasible to be deployed in mobile devices. In addition, an application rewriting technique is used to trace API invocations without any modifications in the Android kernel. In our experiments, we measured the detection accuracy and the computational overheads to evaluate its effectiveness and efficiency of the proposed framework.


Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 124 ◽  
Author(s):  
Yihong Li ◽  
Fangzheng Liu ◽  
Zhenyu Du ◽  
Dubing Zhang

In the malware detection process, obfuscated malicious codes cannot be efficiently and accurately detected solely in the dynamic or static feature space. Aiming at this problem, an integrative feature extraction algorithm based on simhash was proposed, which combines the static information e.g., API (Application Programming Interface) calls and dynamic information (such as file, registry and network behaviors) of malicious samples to form integrative features. The experiment extracts the integrative features of some static information and dynamic information, and then compares the classification, time and obfuscated-detection performance of the static, dynamic and integrated features, respectively, by using several common machine learning algorithms. The results show that the integrative features have better time performance than the static features, and better classification performance than the dynamic features, and almost the same obfuscated-detection performance as the dynamic features. This algorithm can provide some support for feature extraction of malware detection.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Alper Egitmen ◽  
Irfan Bulut ◽  
R. Can Aygun ◽  
A. Bilge Gunduz ◽  
Omer Seyrekbasan ◽  
...  

Android malware detection is an important research topic in the security area. There are a variety of existing malware detection models based on static and dynamic malware analysis. However, most of these models are not very successful when it comes to evasive malware detection. In this study, we aimed to create a malware detection model based on a natural language model called skip-gram to detect evasive malware with the highest accuracy rate possible. In order to train and test our proposed model, we used an up-to-date malware dataset called Argus Android Malware Dataset (AMD) since the AMD contains various evasive malware families and detailed information about them. Meanwhile, for the benign samples, we used Comodo Android Benign Dataset. Our proposed model starts with extracting skip-gram-based features from instruction sequences of Android applications. Then it applies several machine learning algorithms to classify samples as benign or malware. We tested our proposed model with two different scenarios. In the first scenario, the random forest-based classifier performed with 95.64% detection accuracy on the entire dataset and 95% detection accuracy against evasive only samples. In the second scenario, we created a test dataset that contained zero-day malware samples only. For the training set, we did not use any sample that belongs to the malware families in the test set. The random forest-based model performed with 37.36% accuracy rate against zero-day malware. In addition, we compared our proposed model’s malware detection performance against several commercial antimalware applications using VirusTotal API. Our model outperformed 7 out of 10 antimalware applications and tied with one of them on the same test scenario.


Sign in / Sign up

Export Citation Format

Share Document