Machine Learning for Malware Detection: Beyond Accuracy Rates

Today's world is supported by connected, electronic systems, thus ensuring their secure operation is essential to our daily lives. A major threat to system's security is malware infections, which cause ﬁnancial and image losses to corporate and end-users, thus motivating the development of malware detectors. In this scenario, Machine Learning (ML) has been demonstrated to be a powerful technique to develop classiﬁers able to distinguish malware from goodware samples. However, many ML research work on malware detection focus only on the ﬁnal detection accuracy rate and overlook other important aspects of classiﬁer's implementation and evaluation, such as feature extraction and parameter selection. In this paper, we shed light to these aspects to highlight the challenges and drawbacks of ML-based malware classiﬁers development. We trained 25 distinct classiﬁcation models and applied them to 2,800 real x86, Linux ELF malware binaries. Our results shows that: (i) dynamic features outperforms static features when the same classiﬁers are considered; (ii) Discrete-bounded features present smaller accuracy variance over time in comparison to continuous features, at the cost of some time-localized accuracy loss; (iii) Datasets presenting distinct characteristics (e.g., temporal changes) impose generalization challenges to ML models; and (iv) Feature analysis can be used as feedback information for malware detection and infection prevention. We expect that our work could help other researchers when developing their ML-based malware classiﬁcation solutions.

Download Full-text

Dynamic detection of mobile malware using smartphone data and machine learning

Digital Threats: Research and Practice ◽

10.1145/3484246 ◽

2021 ◽

Author(s):

Sebastian Panman de Wit ◽

Doina Bucur ◽

Jeroen van der Ham

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Real Life ◽

Detection Methods ◽

Nearest Neighbour ◽

Dynamic Features ◽

Machine Learning Classifiers ◽

Cpu Usage ◽

Mobile Malware ◽

Mobile Malware Detection

Mobile malware are malicious programs that target mobile devices. They are an increasing problem, as seen in the rise of detected mobile malware samples per year. The number of active smartphone users is expected to grow, stressing the importance of research on the detection of mobile malware. Detection methods for mobile malware exist but are still limited. In this paper, we propose dynamic malware-detection methods that use device information such as the CPU usage, battery usage, and memory usage for the detection of 10 subtypes of Mobile Trojans on the Android Operating System (OS). We use a real-life sensor dataset containing device and malware data from 47 users for a year (2016) to create multiple mobile malware detection methods. We examine which features, i.e. aspects, of a device, are most important to monitor to detect (subtypes of) Mobile Trojans. The focus of this paper is on dynamic hardware features. Using these dynamic features we apply the following machine learning classifiers: Random Forest, K-Nearest Neighbour, and AdaBoost.

Download Full-text

Towards Deep Learning-Based Approach for Detecting Android Malware

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch096 ◽

2021 ◽

pp. 2193-2219

Author(s):

Jarrett Booz ◽

Josh McGiff ◽

William G. Hatcher ◽

Wei Yu ◽

James Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Environment ◽

Malware Detection ◽

Extensive Study ◽

Detection Accuracy ◽

Android Malware ◽

Android Malware Detection ◽

Mobile Malware Detection ◽

Optimal Settings

In this article, the authors implement a deep learning environment and fine-tune parameters to determine the optimal settings for the classification of Android malware from extracted permission data. By determining the optimal settings, the authors demonstrate the potential performance of a deep learning environment for Android malware detection. Specifically, an extensive study is conducted on various hyper-parameters to determine optimal configurations, and then a performance evaluation is carried out on those configurations to compare and maximize detection accuracy in our target networks. The results achieve a detection accuracy of approximately 95%, with an approximate F1 score of 93%. In addition, the evaluation is extended to include other machine learning frameworks, specifically comparing Microsoft Cognitive Toolkit (CNTK) and Theano with TensorFlow. The future needs are discussed in the realm of machine learning for mobile malware detection, including adversarial training, scalability, and the evaluation of additional data and features.

Download Full-text

cHybriDroid: A Machine Learning-Based Hybrid Technique for Securing the Edge Computing

Security and Communication Networks ◽

10.1155/2020/8861639 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Afifa Maryam ◽

Usman Ahmed ◽

Muhammad Aleem ◽

Jerry Chun-Wei Lin ◽

Muhammad Arshad Islam ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Approach ◽

Malware Detection ◽

Edge Computing ◽

Cloud Services ◽

Hybrid Technique ◽

Dynamic Features ◽

Hybrid Features ◽

Android Malware ◽

Detection Techniques

Smart phones are an integral component of the mobile edge computing (MEC) framework. Securing the data stored on mobile devices is very crucial for ensuring the smooth operations of cloud services. A growing number of malicious Android applications demand an in-depth investigation to dissect their malicious intent to design effective malware detection techniques. The contemporary state-of-the-art model suggests that hybrid features based on machine learning (ML) techniques could play a significant role in android malware detection. The selection of application’s features plays a very crucial role to capture the appropriate behavioural patterns of malware instances for a useful classification of mobile applications. In this study, we propose a novel hybrid approach to detect android malware, wherein static features in conjunction with dynamic features of smart phone applications are employed. We collect these hybrid features using permissions, intents, and run-time features (such as information leakage, cryptography’s exploitation, and network manipulations) to analyse the effectiveness of the employed techniques for malware detection. We conduct experiments using over 5,000 real-world applications. The outcomes of the study reveal that the proposed set of features has successfully detected malware threats with 97% F-measure results.

Download Full-text

Towards Deep Learning-Based Approach for Detecting Android Malware

International Journal of Software Innovation ◽

10.4018/ijsi.2019100101 ◽

2019 ◽

Vol 7 (4) ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Jarrett Booz ◽

Josh McGiff ◽

William G. Hatcher ◽

Wei Yu ◽

James Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Environment ◽

Malware Detection ◽

Extensive Study ◽

Detection Accuracy ◽

Android Malware ◽

Android Malware Detection ◽

Mobile Malware Detection ◽

Optimal Settings

Download Full-text

Mlifdect: Android Malware Detection Based on Parallel Machine Learning and Information Fusion

Security and Communication Networks ◽

10.1155/2017/6451260 ◽

2017 ◽

Vol 2017 ◽

pp. 1-14 ◽

Cited By ~ 8

Author(s):

Xin Wang ◽

Dafang Zhang ◽

Xin Su ◽

Wenjia Li

Keyword(s):

Machine Learning ◽

Information Fusion ◽

Malware Detection ◽

Parallel Machine ◽

Detection Methods ◽

Detection Accuracy ◽

Android Malware ◽

Detection Model ◽

Android Apps ◽

Android Malware Detection

In recent years, Android malware has continued to grow at an alarming rate. More recent malicious apps’ employing highly sophisticated detection avoidance techniques makes the traditional machine learning based malware detection methods far less effective. More specifically, they cannot cope with various types of Android malware and have limitation in detection by utilizing a single classification algorithm. To address this limitation, we propose a novel approach in this paper that leverages parallel machine learning and information fusion techniques for better Android malware detection, which is named Mlifdect. To implement this approach, we first extract eight types of features from static analysis on Android apps and build two kinds of feature sets after feature selection. Then, a parallel machine learning detection model is developed for speeding up the process of classification. Finally, we investigate the probability analysis based and Dempster-Shafer theory based information fusion approaches which can effectively obtain the detection results. To validate our method, other state-of-the-art detection works are selected for comparison with real-world Android apps. The experimental results demonstrate that Mlifdect is capable of achieving higher detection accuracy as well as a remarkable run-time efficiency compared to the existing malware detection solutions.

Download Full-text

Runtime Detection Framework for Android Malware

Mobile Information Systems ◽

10.1155/2018/8094314 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

TaeGuen Kim ◽

BooJoong Kang ◽

Eul Gyu Im

Keyword(s):

Dynamic Analysis ◽

Static Analysis ◽

Suffix Tree ◽

Malware Detection ◽

Application Programming Interface ◽

Detection Methods ◽

Detection Accuracy ◽

Dynamic Features ◽

Android Malware ◽

Android Malware Detection

As the number of Android malware has been increased rapidly over the years, various malware detection methods have been proposed so far. Existing methods can be classified into two categories: static analysis-based methods and dynamic analysis-based methods. Both approaches have some limitations: static analysis-based methods are relatively easy to be avoided through transformation techniques such as junk instruction insertions, code reordering, and so on. However, dynamic analysis-based methods also have some limitations that analysis overheads are relatively high and kernel modification might be required to extract dynamic features. In this paper, we propose a dynamic analysis framework for Android malware detection that overcomes the aforementioned shortcomings. The framework uses a suffix tree that contains API (Application Programming Interface) subtraces and their probabilistic confidence values that are generated using HMMs (Hidden Markov Model) to reduce the malware detection overhead, and we designed the framework with the client-server architecture since the suffix tree is infeasible to be deployed in mobile devices. In addition, an application rewriting technique is used to trace API invocations without any modifications in the Android kernel. In our experiments, we measured the detection accuracy and the computational overheads to evaluate its effectiveness and efficiency of the proposed framework.

Download Full-text

A Survey of Attacks Against Twitter Spam Detectors in an Adversarial Environment

Robotics ◽

10.3390/robotics8030050 ◽

2019 ◽

Vol 8 (3) ◽

pp. 50 ◽

Cited By ~ 2

Author(s):

Niddal H. Imam ◽

Vassilios G. Vassilakis

Keyword(s):

Machine Learning ◽

Social Networks ◽

Online Social Networks ◽

Defense Mechanism ◽

Malware Detection ◽

Test Phase ◽

Design Stage ◽

Daily Lives ◽

Prediction Test ◽

New Type

Online Social Networks (OSNs), such as Facebook and Twitter, have become a very important part of many people’s daily lives. Unfortunately, the high popularity of these platforms makes them very attractive to spammers. Machine learning (ML) techniques have been widely used as a tool to address many cybersecurity application problems (such as spam and malware detection). However, most of the proposed approaches do not consider the presence of adversaries that target the defense mechanism itself. Adversaries can launch sophisticated attacks to undermine deployed spam detectors either during training or the prediction (test) phase. Not considering these adversarial activities at the design stage makes OSNs’ spam detectors vulnerable to a range of adversarial attacks. Thus, this paper surveys the attacks against Twitter spam detectors in an adversarial environment, and a general taxonomy of potential adversarial attacks is presented using common frameworks from the literature. Examples of adversarial activities on Twitter that were discovered after observing Arabic trending hashtags are discussed in detail. A new type of spam tweet (adversarial spam tweet), which can be used to undermine a deployed classifier, is examined. In addition, possible countermeasures that could increase the robustness of Twitter spam detectors to such attacks are investigated.

Download Full-text

Comparison of malware detection techniques using machine learning algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i1.pp435-440 ◽

2019 ◽

Vol 16 (1) ◽

pp. 435 ◽

Cited By ~ 1

Author(s):

Nur Syuhada Selamat ◽

Fakariah Hani Mohd Ali

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Malware Detection ◽

Virus Detection ◽

Detection Accuracy ◽

Decision Tree Algorithm ◽

Security Threat ◽

Detection Techniques ◽

Positive Rate

<p>Currently, the volume of malware grows faster each year and poses a thoughtful global security threat. The number of malware developed increases as computers became interconnected, at an alarming rate in the 1990s. This scenario resulted the increment of malware. It also caused many protections are built to fight the malware. Unfortunately, the current technology is no longer effective to handle more advanced malware. Malware authors have created them to become more difficult to be evaded from anti-virus detection. In the current research, Machine Learning (ML) algorithm techniques became more popular to the researchers to analyze malware detection. In this paper, researchers proposed a defense system which uses three ML algorithm techniques comparison and select them based on the high accuracy malware detection. The result indicates that Decision Tree algorithm is the best detection accuracy compares to others classifier with 99% and 0.021% False Positive Rate (FPR) on a relatively small dataset.</p>

Download Full-text

A Survey of Attacks Against Twitter Spam Detectors in an Adversarial Environment

10.20944/preprints201905.0141.v1 ◽

2019 ◽

Author(s):

Niddal Imam

Keyword(s):

Machine Learning ◽

Social Networks ◽

Online Social Networks ◽

Defense Mechanism ◽

Malware Detection ◽

Test Phase ◽

Design Stage ◽

Daily Lives ◽

Prediction Test ◽

New Type

Online Social Networks (OSNs), such as Facebook and Twitter, have become a very important part of many people’s daily lives. Unfortunately, the high popularity of these platforms makes them very attractive to spammers. Machine-learning (ML) techniques have been widely used as a tool to address many cybersecurity application problems (such as spam and malware detection). However, most of the proposed approaches do not consider the presence of adversaries that target the defense mechanism itself. Adversaries can launch sophisticated attacks to undermine deployed spam detectors either during training or the prediction (test) phase. Not considering these adversarial activities at the design stage makes OSNs’ spam detectors prone to a range of adversarial attacks. This paper thus surveys the attacks against Twitter spam detectors in an adversarial environment. In addition, a general taxonomy of potential adversarial attacks is proposed by applying common frameworks from the literature. Examples of adversarial activities on Twitter were provided after observing Arabic trending hashtags. A new type of spam tweet (Adversarial spam tweet), which can be used to undermine deployed classifier, were found. In addition, possible countermeasures that could increase the robustness of Twitter spam detectors against such attacks are investigated.

Download Full-text

Automatically Detecting Excavator Anomalies Based on Machine Learning

Symmetry ◽

10.3390/sym11080957 ◽

2019 ◽

Vol 11 (8) ◽

pp. 957 ◽

Cited By ~ 1

Author(s):

Qingqing Zhou ◽

Guo Chen ◽

Wenjun Jiang ◽

Kenli Li ◽

Keqin Li

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Expert Systems ◽

Statistical Models ◽

Working Condition ◽

Research Work ◽

Support Vector ◽

Detection Accuracy ◽

Learning Methods ◽

Machine Learning Methods

Excavators are one of the most frequently used pieces of equipment in large-scale construction projects. They are closely related to the construction speed and total cost of the entire project. Therefore, it is very important to effectively monitor their operating status and detect abnormal conditions. Previous research work was mainly based on expert systems and traditional statistical models to detect excavator anomalies. However, these methods are not particularly suitable for modern sophisticated excavators. In this paper, we take the first step and explore the use of machine learning methods to automatically detect excavator anomalies by mining its working condition data collected from multiple sensors. The excavators we studied are from Sany Group, the largest construction machinery manufacturer in China. We have collected 40 days working condition data of 107 excavators from Sany. In addition, we worked with six excavator operators and engineers for more than a month to clean the original data and mark the anomalous samples. Based on the processed data, we have designed three anomaly detection schemes based on machine learning methods, using support vector machine (SVM), back propagation (BP) neural network and decision tree algorithms, respectively. Based on the real excavator data, we have carried out a comprehensive evaluation. The results show that the anomaly detection accuracy is as high as 99.88%, which is obviously superior to the previous methods based on expert systems and traditional statistical models.

Download Full-text