Automated Malware Detection in Mobile App Stores Based on Robust Feature Generation

Many Internet of Things (IoT) services are currently tracked and regulated via mobile devices, making them vulnerable to privacy attacks and exploitation by various malicious applications. Current solutions are unable to keep pace with the rapid growth of malware and are limited by low detection accuracy, long discovery time, complex implementation, and high computational costs associated with the processor speed, power, and memory. Therefore, an automated intelligence technique is necessary for detecting apps containing malware and effectively predicting cyberattacks in mobile marketplaces. In this study, a system for classifying mobile marketplaces applications using real-world datasets is proposed, which analyzes the source code to identify malicious apps. A rich feature set of application programming interface (API) calls is proposed to capture the regularities in apps containing malicious content. Two feature-selection methods—Chi-Square and ANOVA—were examined in conjunction with ten supervised machine-learning algorithms. The detection accuracy of each classifier was evaluated to identify the most reliable classifier for malware detection using various feature sets. Chi-Square was found to have a higher detection accuracy as compared to ANOVA. The proposed system achieved a detection accuracy of 98.1% with a classification time of 1.22 s. Furthermore, the proposed system required a reduced number of API calls (500 instead of 9000) to be incorporated as features.

Download Full-text

Permission Sensitivity-Based Malicious Application Detection for Android

Security and Communication Networks ◽

10.1155/2021/6689486 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yubo Song ◽

Yijin Geng ◽

Junbo Wang ◽

Shang Gao ◽

Wei Shi

Keyword(s):

Detection Method ◽

Malware Detection ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Detection Methods ◽

Detection Accuracy ◽

Android Malware ◽

Android Malware Detection ◽

Private Data ◽

Weight Allocation

Since a growing number of malicious applications attempt to steal users’ private data by illegally invoking permissions, application stores have carried out many malware detection methods based on application permissions. However, most of them ignore specific permission combinations and application categories that affect the detection accuracy. The features they extracted are neither representative enough to distinguish benign and malicious applications. For these problems, an Android malware detection method based on permission sensitivity is proposed. First, for each kind of application categories, the permission features and permission combination features are extracted. The sensitive permission feature set corresponding to each category label is then obtained by the feature selection method based on permission sensitivity. In the following step, the permission call situation of the application to be detected is compared with the sensitive permission feature set, and the weight allocation method is used to quantify this information into numerical features. In the proposed method of malicious application detection, three machine-learning algorithms are selected to construct the classifier model and optimize the parameters. Compared with traditional methods, the proposed method consumed 60.94% less time while still achieving high accuracy of up to 92.17%.

Download Full-text

MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

Electronics ◽

10.3390/electronics9111777 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1777

Author(s):

Muhammad Ali ◽

Stavros Shiaeles ◽

Gueltoum Bendiab ◽

Bogdan Ghita

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Detection System ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Detection Methods ◽

Normal Operation ◽

Analysis Technique ◽

N Gram

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

Download Full-text

A Hybrid Model for Android Malware Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2250.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2656-2662

Keyword(s):

Malware Detection ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Dynamic Parameters ◽

Android Malware ◽

Detection Techniques ◽

Advantages And Disadvantages ◽

Android Malware Detection ◽

Tree Classifier ◽

Hybrid Detection

Android malware have risen exponentially over the past few years, posing several serious threats such as system damage, financial loss, and mobile botnets. Various detection techniques have been proposed in the literature for Android malware detection. Some of the techniques analyze static parameters such as permissions, or intents, whereas, others focus on dynamic parameters such as network traffic or system calls. Static techniques are relatively easier to implement, however, stealthy recent malware evade static detection by virtue of update attacks. Dynamic detection can be used to detect such stealthy malware, however, it increases the computation overhead. Hence, both kinds of techniques have their own advantages and disadvantages. In this paper, we have proposed an innovative hybrid detection model that uses both static and dynamic features for malware analysis and detection. We first rank the static and dynamic parameters according to the information gain and then apply machine learning algorithms in the testing phase. The results indicate that hybrid approach is better than both static and dynamic approaches and the proposed model achieves 98.9% detection accuracy with Decision Tree classifier

Download Full-text

Mac OS X Malware Detection with Supervised Machine Learning Algorithms

10.1007/978-3-030-74753-4_13 ◽

2022 ◽

pp. 193-208

Author(s):

Samira Eisaloo Gharghasheh ◽

Shahrzad Hadayeghparast

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning

Download Full-text

Cross-Method-Based Analysis and Classification of Malicious Behavior by API Calls Extraction

Applied Sciences ◽

10.3390/app9020239 ◽

2019 ◽

Vol 9 (2) ◽

pp. 239 ◽

Cited By ~ 8

Author(s):

Bruce Ndibanje ◽

Ki Kim ◽

Young Kang ◽

Hyun Kim ◽

Tae Kim ◽

...

Keyword(s):

Application Programming Interface ◽

Machine Learning Algorithms ◽

Data Driven ◽

Public Security ◽

Detection Accuracy ◽

Malicious Behavior ◽

Application Programming ◽

Programming Interface ◽

Method Show

Data-driven public security networking and computer systems are always under threat from malicious codes known as malware; therefore, a large amount of research and development is taking place to find effective countermeasures. These countermeasures are mainly based on dynamic and statistical analysis. Because of the obfuscation techniques used by the malware authors, security researchers and the anti-virus industry are facing a colossal issue regarding the extraction of hidden payloads within packed executable extraction. Based on this understanding, we first propose a method to de-obfuscate and unpack the malware samples. Additional, cross-method-based big data analysis to dynamically and statistically extract features from malware has been proposed. The Application Programming Interface (API) call sequences that reflect the malware behavior of its code have been used to detect behavior such as network traffic, modifying a file, writing to stderr or stdout, modifying a registry value, creating a process. Furthermore, we include a similarity analysis and machine learning algorithms to profile and classify malware behaviors. The experimental results of the proposed method show that malware detection accuracy is very useful to discover potential threats and can help the decision-maker to deploy appropriate countermeasures.

Download Full-text

Android Malware Detection Based on a Hybrid Deep Learning Model

Security and Communication Networks ◽

10.1155/2020/8863617 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Tianliang Lu ◽

Yanhui Du ◽

Li Ouyang ◽

Qiuyu Chen ◽

Xirui Wang

Keyword(s):

Deep Learning ◽

Learning Algorithms ◽

Malware Detection ◽

Learning Model ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Dynamic Feature ◽

Android Malware ◽

Android Malware Detection ◽

Deep Learning Model

In recent years, the number of malware on the Android platform has been increasing, and with the widespread use of code obfuscation technology, the accuracy of antivirus software and traditional detection algorithms is low. Current state-of-the-art research shows that researchers started applying deep learning methods for malware detection. We proposed an Android malware detection algorithm based on a hybrid deep learning model which combines deep belief network (DBN) and gate recurrent unit (GRU). First of all, analyze the Android malware; in addition to extracting static features, dynamic behavioral features with strong antiobfuscation ability are also extracted. Then, build a hybrid deep learning model for Android malware detection. Because the static features are relatively independent, the DBN is used to process the static features. Because the dynamic features have temporal correlation, the GRU is used to process the dynamic feature sequence. Finally, the training results of DBN and GRU are input into the BP neural network, and the final classification results are output. Experimental results show that, compared with the traditional machine learning algorithms, the Android malware detection model based on hybrid deep learning algorithms has a higher detection accuracy, and it also has a better detection effect on obfuscated malware.

Download Full-text

Runtime Detection Framework for Android Malware

Mobile Information Systems ◽

10.1155/2018/8094314 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

TaeGuen Kim ◽

BooJoong Kang ◽

Eul Gyu Im

Keyword(s):

Dynamic Analysis ◽

Static Analysis ◽

Suffix Tree ◽

Malware Detection ◽

Application Programming Interface ◽

Detection Methods ◽

Detection Accuracy ◽

Dynamic Features ◽

Android Malware ◽

Android Malware Detection

As the number of Android malware has been increased rapidly over the years, various malware detection methods have been proposed so far. Existing methods can be classified into two categories: static analysis-based methods and dynamic analysis-based methods. Both approaches have some limitations: static analysis-based methods are relatively easy to be avoided through transformation techniques such as junk instruction insertions, code reordering, and so on. However, dynamic analysis-based methods also have some limitations that analysis overheads are relatively high and kernel modification might be required to extract dynamic features. In this paper, we propose a dynamic analysis framework for Android malware detection that overcomes the aforementioned shortcomings. The framework uses a suffix tree that contains API (Application Programming Interface) subtraces and their probabilistic confidence values that are generated using HMMs (Hidden Markov Model) to reduce the malware detection overhead, and we designed the framework with the client-server architecture since the suffix tree is infeasible to be deployed in mobile devices. In addition, an application rewriting technique is used to trace API invocations without any modifications in the Android kernel. In our experiments, we measured the detection accuracy and the computational overheads to evaluate its effectiveness and efficiency of the proposed framework.

Download Full-text

A Simhash-Based Integrative Features Extraction Algorithm for Malware Detection

Algorithms ◽

10.3390/a11080124 ◽

2018 ◽

Vol 11 (8) ◽

pp. 124 ◽

Cited By ~ 1

Author(s):

Yihong Li ◽

Fangzheng Liu ◽

Zhenyu Du ◽

Dubing Zhang

Keyword(s):

Feature Extraction ◽

Malware Detection ◽

Application Programming Interface ◽

Classification Performance ◽

Detection Performance ◽

Machine Learning Algorithms ◽

Dynamic Features ◽

Dynamic Information ◽

Static Information ◽

Extraction Algorithm

In the malware detection process, obfuscated malicious codes cannot be efficiently and accurately detected solely in the dynamic or static feature space. Aiming at this problem, an integrative feature extraction algorithm based on simhash was proposed, which combines the static information e.g., API (Application Programming Interface) calls and dynamic information (such as file, registry and network behaviors) of malicious samples to form integrative features. The experiment extracts the integrative features of some static information and dynamic information, and then compares the classification, time and obfuscated-detection performance of the static, dynamic and integrated features, respectively, by using several common machine learning algorithms. The results show that the integrative features have better time performance than the static features, and better classification performance than the dynamic features, and almost the same obfuscated-detection performance as the dynamic features. This algorithm can provide some support for feature extraction of malware detection.

Download Full-text

Combat Mobile Evasive Malware via Skip-Gram-Based Malware Detection

Security and Communication Networks ◽

10.1155/2020/6726147 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Alper Egitmen ◽

Irfan Bulut ◽

R. Can Aygun ◽

A. Bilge Gunduz ◽

Omer Seyrekbasan ◽

...

Keyword(s):

Random Forest ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Malware Analysis ◽

Test Scenario ◽

Accuracy Rate ◽

Android Malware ◽

Proposed Model ◽

Evasive Malware

Android malware detection is an important research topic in the security area. There are a variety of existing malware detection models based on static and dynamic malware analysis. However, most of these models are not very successful when it comes to evasive malware detection. In this study, we aimed to create a malware detection model based on a natural language model called skip-gram to detect evasive malware with the highest accuracy rate possible. In order to train and test our proposed model, we used an up-to-date malware dataset called Argus Android Malware Dataset (AMD) since the AMD contains various evasive malware families and detailed information about them. Meanwhile, for the benign samples, we used Comodo Android Benign Dataset. Our proposed model starts with extracting skip-gram-based features from instruction sequences of Android applications. Then it applies several machine learning algorithms to classify samples as benign or malware. We tested our proposed model with two different scenarios. In the first scenario, the random forest-based classifier performed with 95.64% detection accuracy on the entire dataset and 95% detection accuracy against evasive only samples. In the second scenario, we created a test dataset that contained zero-day malware samples only. For the training set, we did not use any sample that belongs to the malware families in the test set. The random forest-based model performed with 37.36% accuracy rate against zero-day malware. In addition, we compared our proposed model’s malware detection performance against several commercial antimalware applications using VirusTotal API. Our model outperformed 7 out of 10 antimalware applications and tied with one of them on the same test scenario.

Download Full-text

Assessment of supervised machine learning algorithms using dynamic API calls for malware detection

International Journal of Computers and Applications ◽

10.1080/1206212x.2020.1732641 ◽

2020 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Jagsir Singh ◽

Jaswinder Singh

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning

Download Full-text