scholarly journals Applied machine learning in recognition of DGA domain names

Author(s):  
Miroslav Stampar ◽  
Kresimir Fertalj

Recognition of domain names generated by domain generation algorithms (DGAs) is the essential part of malware detection by inspection of network traffic. Besides basic heuristics (HE) and limited detection based on blacklists, the most promising course seems to be machine learning (ML). There is a lack of studies that extensively compare different ML models in the field of DGA binary classification, including both conventional and deep learning (DL) representatives. Also, those few that exist are either focused on a small set of models, use a poor set of features in ML models or fail to secure unbiased independence between training and evaluation samples. To overcome these limitations, we engineered a robust feature set, and accordingly trained and evaluated 14 ML, 9 DL, and 2 comparative models on two independent datasets. Results show that if ML features are properly engineered, there is a marginal difference in overall score between top ML and DL representatives. This paper represents the first attempt to neutrally compare the performance of many different models for the recognition of DGA domain names, where the best models perform as well as the top representatives from the literature.

2020 ◽  
Vol 14 ◽  
Author(s):  
Meghna Dhalaria ◽  
Ekta Gandotra

Purpose: This paper provides the basics of Android malware, its evolution and tools and techniques for malware analysis. Its main aim is to present a review of the literature on Android malware detection using machine learning and deep learning and identify the research gaps. It provides the insights obtained through literature and future research directions which could help researchers to come up with robust and accurate techniques for classification of Android malware. Design/Methodology/Approach: This paper provides a review of the basics of Android malware, its evolution timeline and detection techniques. It includes the tools and techniques for analyzing the Android malware statically and dynamically for extracting features and finally classifying these using machine learning and deep learning algorithms. Findings: The number of Android users is expanding very fast due to the popularity of Android devices. As a result, there are more risks to Android users due to the exponential growth of Android malware. On-going research aims to overcome the constraints of earlier approaches for malware detection. As the evolving malware are complex and sophisticated, earlier approaches like signature based and machine learning based are not able to identify these timely and accurately. The findings from the review shows various limitations of earlier techniques i.e. requires more detection time, high false positive and false negative rate, low accuracy in detecting sophisticated malware and less flexible. Originality/value: This paper provides a systematic and comprehensive review on the tools and techniques being employed for analysis, classification and identification of Android malicious applications. It includes the timeline of Android malware evolution, tools and techniques for analyzing these statically and dynamically for the purpose of extracting features and finally using these features for their detection and classification using machine learning and deep learning algorithms. On the basis of the detailed literature review, various research gaps are listed. The paper also provides future research directions and insights which could help researchers to come up with innovative and robust techniques for detecting and classifying the Android malware.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1550
Author(s):  
Alexandros Liapis ◽  
Evanthia Faliagka ◽  
Christos P. Antonopoulos ◽  
Georgios Keramidas ◽  
Nikolaos Voros

Physiological measurements have been widely used by researchers and practitioners in order to address the stress detection challenge. So far, various datasets for stress detection have been recorded and are available to the research community for testing and benchmarking. The majority of the stress-related available datasets have been recorded while users were exposed to intense stressors, such as songs, movie clips, major hardware/software failures, image datasets, and gaming scenarios. However, it remains an open research question if such datasets can be used for creating models that will effectively detect stress in different contexts. This paper investigates the performance of the publicly available physiological dataset named WESAD (wearable stress and affect detection) in the context of user experience (UX) evaluation. More specifically, electrodermal activity (EDA) and skin temperature (ST) signals from WESAD were used in order to train three traditional machine learning classifiers and a simple feed forward deep learning artificial neural network combining continues variables and entity embeddings. Regarding the binary classification problem (stress vs. no stress), high accuracy (up to 97.4%), for both training approaches (deep-learning, machine learning), was achieved. Regarding the stress detection effectiveness of the created models in another context, such as user experience (UX) evaluation, the results were quite impressive. More specifically, the deep-learning model achieved a rather high agreement when a user-annotated dataset was used for validation.


2021 ◽  
Vol 2 (2) ◽  
Author(s):  
Kate Highnam ◽  
Domenic Puzio ◽  
Song Luo ◽  
Nicholas R. Jennings

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.


Author(s):  
Jarrett Booz ◽  
Josh McGiff ◽  
William G. Hatcher ◽  
Wei Yu ◽  
James Nguyen ◽  
...  

In this article, the authors implement a deep learning environment and fine-tune parameters to determine the optimal settings for the classification of Android malware from extracted permission data. By determining the optimal settings, the authors demonstrate the potential performance of a deep learning environment for Android malware detection. Specifically, an extensive study is conducted on various hyper-parameters to determine optimal configurations, and then a performance evaluation is carried out on those configurations to compare and maximize detection accuracy in our target networks. The results achieve a detection accuracy of approximately 95%, with an approximate F1 score of 93%. In addition, the evaluation is extended to include other machine learning frameworks, specifically comparing Microsoft Cognitive Toolkit (CNTK) and Theano with TensorFlow. The future needs are discussed in the realm of machine learning for mobile malware detection, including adversarial training, scalability, and the evaluation of additional data and features.


2020 ◽  
Vol 39 (3) ◽  
pp. 4785-4801
Author(s):  
Cho Do Xuan ◽  
Mai Hoang Dao ◽  
Hoa Dinh Nguyen

Advanced Persistent Threat (APT) attacks are a form of malicious, intentionally and clearly targeted attack. This attack technique is growing in both the number of recorded attacks and the extent of its dangers to organizations, businesses and governments. Therefore, the task of detecting and warning APT attacks in the real system is very necessary today. One of the most effective approaches to APT attack detection is to apply machine learning or deep learning to analyze network traffic. There have been a number of studies and recommendations to analyze network traffic into network flows and then combine with some classification or clustering methods to look for signs of APT attacks. In particular, recent studies often apply machine learning algorithms to spot the present of APT attacks based on network flow. In this paper, a new method based on deep learning to detect APT attacks using network flow is proposed. Accordingly, in our research, network traffic is analyzed into IP-based network flows, then the IP information is reconstructed from flow, and finally deep learning models are used to extract features for detecting APT attack IPs from other IPs. Additionally, a combined deep learning model using Bidirectional Long Short-Term Memory (BiLSTM) and Graph Convolutional Networks (GCN) is introduced. The new detection model is evaluated and compared with some traditional machine learning models, i.e. Multi-layer perceptron (MLP) and single GCN models, in the experiments. Experimental results show that BiLSTM-GCN model has the best performance in all evaluation scores. This not only shows that deep learning application on flow network analysis to detect APT attacks is a good decision but also suggests a new direction for network intrusion detection techniques based on deep learning.


2021 ◽  
Vol 11 (21) ◽  
pp. 10249
Author(s):  
Chien-Nguyen Nhu ◽  
Minho Park

Cloud computing is currently considered the most cost-effective platform for offering business and consumer IT services over the Internet. However, it is prone to new vulnerabilities. A new type of attack called an economic denial of sustainability (EDoS) attack exploits the pay-per-use model to scale up the resource usage over time to the extent that the cloud user has to pay for the unexpected usage charge. To prevent EDoS attacks, a few solutions have been proposed, including hard-threshold and machine learning-based solutions. Among them, long short-term memory (LSTM)-based solutions achieve much higher accuracy and false-alarm rates than hard-threshold and other machine learning-based solutions. However, LSTM requires a long sequence length of the input data, leading to a degraded performance owing to increases in the calculations, the detection time, and consuming a large number of computing resources of the defense system. We, therefore, propose a two-phase deep learning-based EDoS detection scheme that uses an LSTM model to detect each abnormal flow in network traffic; however, the LSTM model requires only a short sequence length of five of the input data. Thus, the proposed scheme can take advantage of the efficiency of the LSTM algorithm in detecting each abnormal flow in network traffic, while reducing the required sequence length of the input data. A comprehensive performance evaluation shows that our proposed scheme outperforms the existing solutions in terms of accuracy and resource consumption.


2019 ◽  
Vol 7 (4) ◽  
pp. 1-24 ◽  
Author(s):  
Jarrett Booz ◽  
Josh McGiff ◽  
William G. Hatcher ◽  
Wei Yu ◽  
James Nguyen ◽  
...  

In this article, the authors implement a deep learning environment and fine-tune parameters to determine the optimal settings for the classification of Android malware from extracted permission data. By determining the optimal settings, the authors demonstrate the potential performance of a deep learning environment for Android malware detection. Specifically, an extensive study is conducted on various hyper-parameters to determine optimal configurations, and then a performance evaluation is carried out on those configurations to compare and maximize detection accuracy in our target networks. The results achieve a detection accuracy of approximately 95%, with an approximate F1 score of 93%. In addition, the evaluation is extended to include other machine learning frameworks, specifically comparing Microsoft Cognitive Toolkit (CNTK) and Theano with TensorFlow. The future needs are discussed in the realm of machine learning for mobile malware detection, including adversarial training, scalability, and the evaluation of additional data and features.


Sign in / Sign up

Export Citation Format

Share Document