On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection

Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.

Download Full-text

TFDroid: Android Malware Detection by Topics and Sensitive Data Flows Using Machine Learning Techniques

2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT) ◽

10.1109/infoct.2019.8711179 ◽

2019 ◽

Author(s):

Songhao Lou ◽

Shaoyin Cheng ◽

Jingjing Huang ◽

Fan Jiang

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Machine Learning Techniques ◽

Sensitive Data ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques ◽

Data Flows

Download Full-text

MLDroid—framework for Android malware detection using machine learning techniques

Neural Computing and Applications ◽

10.1007/s00521-020-05309-4 ◽

2020 ◽

Author(s):

Arvind Mahindru ◽

A. L. Sangal

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques

Download Full-text

A Static Feature Selection-based Android Malware Detection Using Machine Learning Techniques

2020 International Conference on Smart Electronics and Communication (ICOSEC) ◽

10.1109/icosec49089.2020.9215355 ◽

2020 ◽

Author(s):

Aviral Sangal ◽

Harsh Kumar Verma

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques ◽

Static Feature

Download Full-text

Graph Approach for android malware detection using machine learning techniques

Humanitarian and Natural Sciences Journal ◽

10.53796/hnsj21115 ◽

2021 ◽

Vol 2 (11) ◽

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques

Download Full-text

Android malware detection based on image-based features and machine learning techniques

SN Applied Sciences ◽

10.1007/s42452-020-3132-2 ◽

2020 ◽

Vol 2 (7) ◽

Author(s):

Halil Murat Ünver ◽

Khaled Bakour

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques

Download Full-text

Android Malware Detection through Machine Learning Techniques: A Review

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v16i02.11549 ◽

2020 ◽

Vol 16 (02) ◽

pp. 14

Author(s):

Abikoye Oluwakemi Christiana ◽

Benjamin Aruwa Gyunka ◽

Akande Noah

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Social Engineering ◽

Machine Learning Techniques ◽

Android Malware ◽

Detection Systems ◽

Android Malware Detection ◽

Android Os ◽

Learning Techniques ◽

The Individual

<p class="0abstract">The open source nature of Android Operating System has attracted wider adoption of the system by multiple types of developers. This phenomenon has further fostered an exponential proliferation of devices running the Android OS into different sectors of the economy. Although this development has brought about great technological advancements and ease of doing businesses (e-commerce) and social interactions, they have however become strong mediums for the uncontrolled rising cyberattacks and espionage against business infrastructures and the individual users of these mobile devices. Different cyberattacks techniques exist but attacks through malicious applications have taken the lead aside other attack methods like social engineering. Android malware have evolved in sophistications and intelligence that they have become highly resistant to existing detection systems especially those that are signature-based. Machine learning techniques have risen to become a more competent choice for combating the kind of sophistications and novelty deployed by emerging Android malwares. The models created via machine learning methods work by first learning the existing patterns of malware behaviour and then use this knowledge to separate or identify any such similar behaviour from unknown attacks. This paper provided a comprehensive review of machine learning techniques and their applications in Android malware detection as found in contemporary literature.</p>

Download Full-text

Dynamic Permissions based Android Malware Detection using Machine Learning Techniques

Proceedings of the 10th Innovations in Software Engineering Conference on - ISEC '17 ◽

10.1145/3021460.3021485 ◽

2017 ◽

Cited By ~ 15

Author(s):

Arvind Mahindru ◽

Paramvir Singh

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Android Malware Detection ◽

Learning Techniques

Download Full-text

Evaluation of Advanced Ensemble Learning Techniques for Android Malware Detection

Vietnam Journal of Computer Science ◽

10.1142/s2196888820500086 ◽

2020 ◽

Vol 07 (02) ◽

pp. 145-159 ◽

Cited By ~ 1

Author(s):

Md. Shohel Rana ◽

Andrew H. Sung

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ensemble Learning ◽

Malware Detection ◽

Learning Systems ◽

Application Framework ◽

Android Malware ◽

Security Controls ◽

Android Malware Detection ◽

Learning Techniques

Android is the most well-known portable working framework having billions of dynamic clients worldwide that pulled in promoters, programmers, and cybercriminals to create malware for different purposes. As of late, wide-running inquiries have been led on malware examination and identification for Android gadgets while Android has likewise actualized different security controls to manage the malware issues, including a User ID (UID) for every application, framework authorizations. In this paper, we advance and assess various kinds of machine learning (ML) by applying ensemble-based learning systems for identifying Android malware related to a substring-based feature selection (SBFS) strategy for the classifiers. In the investigation, we have broadened our previous work where it has been seen that the ensemble-based learning techniques acquire preferred outcome over the recently revealed outcome by directing the DREBIN dataset, and in this manner they give a solid premise to building compelling instruments for Android malware detection.

Download Full-text

A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection

Information ◽

10.3390/info12050185 ◽

2021 ◽

Vol 12 (5) ◽

pp. 185

Author(s):

Vasileios Kouliaridis ◽

Georgios Kambourakis

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Malware Detection ◽

Machine Learning Techniques ◽

Android Malware ◽

Detection Techniques ◽

Android Malware Detection ◽

Mobile Malware ◽

Comprehensive Survey ◽

Mobile Malware Detection

Year after year, mobile malware attacks grow in both sophistication and diffusion. As the open source Android platform continues to dominate the market, malware writers consider it as their preferred target. Almost strictly, state-of-the-art mobile malware detection solutions in the literature capitalize on machine learning to detect pieces of malware. Nevertheless, our findings clearly indicate that the majority of existing works utilize different metrics and models and employ diverse datasets and classification features stemming from disparate analysis techniques, i.e., static, dynamic, or hybrid. This complicates the cross-comparison of the various proposed detection schemes and may also raise doubts about the derived results. To address this problem, spanning a period of the last seven years, this work attempts to schematize the so far ML-powered malware detection approaches and techniques by organizing them under four axes, namely, the age of the selected dataset, the analysis type used, the employed ML techniques, and the chosen performance metrics. Moreover, based on these axes, we introduce a converging scheme which can guide future Android malware detection techniques and provide a solid baseline to machine learning practices in this field.

Download Full-text

On the Feasibility of Adversarial Sample Creation Using the Android System API

Information ◽

10.3390/info11090433 ◽

2020 ◽

Vol 11 (9) ◽

pp. 433

Author(s):

Fabrizio Cara ◽

Michele Scalas ◽

Giorgio Giacinto ◽

Davide Maiorca

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Random Noise ◽

Malware Detection ◽

Android Malware ◽

Static And Dynamic Analysis ◽

Detection Systems ◽

Android Malware Detection ◽

Expert Analysis ◽

The Impact

Due to its popularity, the Android operating system is a critical target for malware attacks. Multiple security efforts have been made on the design of malware detection systems to identify potentially harmful applications. In this sense, machine learning-based systems, leveraging both static and dynamic analysis, have been increasingly adopted to discriminate between legitimate and malicious samples due to their capability of identifying novel variants of malware samples. At the same time, attackers have been developing several techniques to evade such systems, such as the generation of evasive apps, i.e., carefully-perturbed samples that can be classified as legitimate by the classifiers. Previous work has shown the vulnerability of detection systems to evasion attacks, including those designed for Android malware detection. However, most works neglected to bring the evasive attacks onto the so-called problem space, i.e., by generating concrete Android adversarial samples, which requires preserving the app’s semantics and being realistic for human expert analysis. In this work, we aim to understand the feasibility of generating adversarial samples specifically through the injection of system API calls, which are typical discriminating characteristics for malware detectors. We perform our analysis on a state-of-the-art ransomware detector that employs the occurrence of system API calls as features of its machine learning algorithm. In particular, we discuss the constraints that are necessary to generate real samples, and we use techniques inherited from interpretability to assess the impact of specific API calls to evasion. We assess the vulnerability of such a detector against mimicry and random noise attacks. Finally, we propose a basic implementation to generate concrete and working adversarial samples. The attained results suggest that injecting system API calls could be a viable strategy for attackers to generate concrete adversarial samples. However, we point out the low suitability of mimicry attacks and the necessity to build more sophisticated evasion attacks.

Download Full-text