Heterogeneous Graph Matching Networks for Unknown Malware Detection

Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.

Download Full-text

Getting ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer

Entropy ◽

10.3390/e23040395 ◽

2021 ◽

Vol 23 (4) ◽

pp. 395

Author(s):

Héctor D. Menéndez ◽

David Clark ◽

Earl T. Barr

Keyword(s):

Machine Learning ◽

Evolutionary Computation ◽

Malware Detection ◽

Cloud Service ◽

False Positives ◽

Arms Race ◽

System Calls ◽

A New Technique ◽

Coevolutionary Arms Race

Malware detection is in a coevolutionary arms race where the attackers and defenders are constantly seeking advantage. This arms race is asymmetric: detection is harder and more expensive than evasion. White hats must be conservative to avoid false positives when searching for malicious behaviour. We seek to redress this imbalance. Most of the time, black hats need only make incremental changes to evade them. On occasion, white hats make a disruptive move and find a new technique that forces black hats to work harder. Examples include system calls, signatures and machine learning. We present a method, called Hothouse, that combines simulation and search to accelerate the white hat’s ability to counter the black hat’s incremental moves, thereby forcing black hats to perform disruptive moves more often. To realise Hothouse, we evolve EEE, an entropy-based polymorphic packer for Windows executables. Playing the role of a black hat, EEE uses evolutionary computation to disrupt the creation of malware signatures. We enter EEE into the detection arms race with VirusTotal, the most prominent cloud service for running anti-virus tools on software. During our 6 month study, we continually improved EEE in response to VirusTotal, eventually learning a packer that produces packed malware whose evasiveness goes from an initial 51.8% median to 19.6%. We report both how well VirusTotal learns to detect EEE-packed binaries and how well VirusTotal forgets in order to reduce false positives. VirusTotal’s tools learn and forget fast, actually in about 3 days. We also show where VirusTotal focuses its detection efforts, by analysing EEE’s variants.

Download Full-text

Explaining Bad Forecasts in Global Time Series Models

Applied Sciences ◽

10.3390/app11199243 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9243

Author(s):

Jože Rožanec ◽

Elena Trajkova ◽

Klemen Kenda ◽

Blaž Fortuna ◽

Dunja Mladenić

Keyword(s):

Time Series ◽

Model Performance ◽

Time Series Forecasting ◽

Modular Architecture ◽

Detection Algorithms ◽

Training Samples ◽

Forecasting Performance ◽

Global Time ◽

Real World Datasets ◽

Value Changes

While increasing empirical evidence suggests that global time series forecasting models can achieve better forecasting performance than local ones, there is a research void regarding when and why the global models fail to provide a good forecast. This paper uses anomaly detection algorithms and explainable artificial intelligence (XAI) to answer when and why a forecast should not be trusted. To address this issue, a dashboard was built to inform the user regarding (i) the relevance of the features for that particular forecast, (ii) which training samples most likely influenced the forecast outcome, (iii) why the forecast is considered an outlier, and (iv) provide a range of counterfactual examples to understand how value changes in the feature vector can lead to a different outcome. Moreover, a modular architecture and a methodology were developed to iteratively remove noisy data instances from the train set, to enhance the overall global time series forecasting model performance. Finally, to test the effectiveness of the proposed approach, it was validated on two publicly available real-world datasets.

Download Full-text

Feature Reduction and Optimization of Malware Detection System Using Ant Colony Optimization and Rough Sets

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2020070106 ◽

2020 ◽

Vol 14 (3) ◽

pp. 95-114

Author(s):

Ravi Kiran Varma Penmatsa ◽

Akhila Kalidindi ◽

S. Kumar Reddy Mallidi

Keyword(s):

Ant Colony Optimization ◽

Detection System ◽

Malware Detection ◽

Feature Reduction ◽

Ant Colony ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Feature Significance ◽

Malware Detection And Classification ◽

Malicious Program

Malware is a malicious program that can cause a security breach of a system. Malware detection and classification is one of the burning topics of research in information security. Executable files are the major source of input for static malware detection. Machine learning techniques are very efficient in behavioral-based malware detection and need a dataset of malware with different features. In windows, malware can be detected by analyzing the portable executable (PE) files. This work contributes to identifying the minimum feature set for malware detection employing a rough set dependent feature significance combined with Ant Colony Optimization (ACO) as the heuristic-search technique. A malware dataset named claMP with both integrated features and raw features was considered as the benchmark dataset for this work. The analytical results prove that 97.15% and 92.8% data size optimization has been achieved with a minimum loss of accuracy for claMP integrated and raw datasets, respectively.

Download Full-text

Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6604 ◽

2020 ◽

Vol 34 (06) ◽

pp. 10369-10376

Author(s):

Peng Gao ◽

Hao Zhang

Keyword(s):

Spatial Information ◽

Graph Matching ◽

Fundamental Problem ◽

Graph Representation ◽

Spatial Relationships ◽

Matching Problem ◽

Loop Closure ◽

Loop Closure Detection ◽

Visual Spatial

Loop closure detection is a fundamental problem for simultaneous localization and mapping (SLAM) in robotics. Most of the previous methods only consider one type of information, based on either visual appearances or spatial relationships of landmarks. In this paper, we introduce a novel visual-spatial information preserving multi-order graph matching approach for long-term loop closure detection. Our approach constructs a graph representation of a place from an input image to integrate visual-spatial information, including visual appearances of the landmarks and the background environment, as well as the second and third-order spatial relationships between two and three landmarks, respectively. Furthermore, we introduce a new formulation that formulates loop closure detection as a multi-order graph matching problem to compute a similarity score directly from the graph representations of the query and template images, instead of performing conventional vector-based image matching. We evaluate the proposed multi-order graph matching approach based on two public long-term loop closure detection benchmark datasets, including the St. Lucia and CMU-VL datasets. Experimental results have shown that our approach is effective for long-term loop closure detection and it outperforms the previous state-of-the-art methods.

Download Full-text

Real-Time Detection of Infusion Site Failures in a Closed-Loop Artificial Pancreas

Journal of Diabetes Science and Technology ◽

10.1177/1932296818755173 ◽

2018 ◽

Vol 12 (3) ◽

pp. 599-607 ◽

Cited By ~ 11

Author(s):

Daniel P. Howsmon ◽

Nihat Baysal ◽

Bruce A. Buckingham ◽

Gregory P. Forlenza ◽

Trang T. Ly ◽

...

Keyword(s):

Real Time ◽

Artificial Pancreas ◽

Intervention Strategy ◽

Failure Detection ◽

Detection Algorithm ◽

Outpatient Setting ◽

False Positives ◽

Zone Model ◽

Infusion Site ◽

Detection Algorithms

Background: As evidence emerges that artificial pancreas systems improve clinical outcomes for patients with type 1 diabetes, the burden of this disease will hopefully begin to be alleviated for many patients and caregivers. However, reliance on automated insulin delivery potentially means patients will be slower to act when devices stop functioning appropriately. One such scenario involves an insulin infusion site failure, where the insulin that is recorded as delivered fails to affect the patient’s glucose as expected. Alerting patients to these events in real time would potentially reduce hyperglycemia and ketosis associated with infusion site failures. Methods: An infusion site failure detection algorithm was deployed in a randomized crossover study with artificial pancreas and sensor-augmented pump arms in an outpatient setting. Each arm lasted two weeks. Nineteen participants wore infusion sets for up to 7 days. Clinicians contacted patients to confirm infusion site failures detected by the algorithm and instructed on set replacement if failure was confirmed. Results: In real time and under zone model predictive control, the infusion site failure detection algorithm achieved a sensitivity of 88.0% (n = 25) while issuing only 0.22 false positives per day, compared with a sensitivity of 73.3% (n = 15) and 0.27 false positives per day in the SAP arm (as indicated by retrospective analysis). No association between intervention strategy and duration of infusion sets was observed ( P = .58). Conclusions: As patient burden is reduced by each generation of advanced diabetes technology, fault detection algorithms will help ensure that patients are alerted when they need to manually intervene. Clinical Trial Identifier: www.clinicaltrials.gov,NCT02773875

Download Full-text

How Might Wet Retroreflective Pavement Markings Enable More Robust Machine Vision?

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119847620 ◽

2019 ◽

Vol 2673 (11) ◽

pp. 361-366 ◽

Cited By ~ 1

Author(s):

Adam M. Pike ◽

Jordan Whitney ◽

Thomas Hedblom ◽

Susannah Clear

Keyword(s):

Machine Vision ◽

Feature Detection ◽

Preliminary Investigation ◽

Image Data ◽

Ccd Camera ◽

False Positives ◽

Lane Detection ◽

Pavement Markings ◽

Detection Algorithms ◽

Night And Day

This study is a preliminary investigation of the effects of levels of wet retroreflectivity of pavement markings on factors that determine robust feature detection in machine vision and light detection and ranging (LiDAR) systems in continuously wet road conditions. Luminance and Weber contrast of a range of pavement markings were characterized as functions of wet retroreflectivity and distance based on calibrated charge-coupled device (CCD) camera measurements. Both were found to trend with wet retroflectivity over the range of distances considered in this study. Artifacts arising from glare sources in wet conditions and their intensities relative to pavement markings of different wet retroreflectivity levels were demonstrated. Image data suggests that markings with high wet retroreflectivity may help to mitigate identification of these artifacts as false positives in lane awareness/lane detection algorithms. As LiDAR presents a viable sensor fusion approach to identifying and avoiding these false positives and artifacts in both nighttime wet and daytime wet road conditions, LiDAR return was characterized on pavement markings comprising both optics designed only for dry retroreflectivity and optics designed to be retroreflective in both dry and wet conditions. Preliminary results suggest that for common pavement marking constructions based on exposed beaded optics that might be completely immersed by a rainstorm or puddling, incorporation of high index (n~2.4) wet retroreflective beaded optics is likely to be advantageous to both visible machine vision systems and LiDAR for detection of those retroreflective markings in both night and day.

Download Full-text

Enhanced Android Malware Detection and Family Classification, using Conversation-level Network Traffic Features

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4a/4 ◽

2020 ◽

Vol 17 (4A) ◽

pp. 607-614

Author(s):

Mohammad Abuthawabeh ◽

Khaled Mahmoud

Keyword(s):

Real World ◽

Network Traffic ◽

Malware Detection ◽

The Other ◽

Android Malware ◽

Detection Algorithms ◽

Android Malware Detection ◽

Learning Technique ◽

Massive Number ◽

Family Classification

Signature-based malware detection algorithms are facing challenges to cope with the massive number of threats in the Android environment. In this paper, conversation-level network traffic features are extracted and used in a supervised-based model. This model was used to enhance the process of Android malware detection, categorization, and family classification. The model employs the ensemble learning technique in order to select the most useful features among the extracted features. A real-world dataset called CICAndMal2017 was used in this paper. The results show that Extra-trees classifier had achieved the highest weighted accuracy percentage among the other classifiers by 87.75%, 79.97%, and 66.71%for malware detection, malware categorization, and malware family classification respectively. A comparison with another study that uses the same dataset was made. This study has achieved a significant enhancement in malware family classification and malware categorization. For malware family classification, the enhancement was 39.71% for precision and 41.09% for recall. The rate of enhancement for the Android malware categorization was 30.2% and 31.14‬% for precision and recall, respectively

Download Full-text

SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Mobile Devices

The Scientific World JOURNAL ◽

10.1155/2014/101986 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Chao Wang ◽

Zhizhong Wu ◽

Xi Li ◽

Xuehai Zhou ◽

Aili Wang ◽

...

Keyword(s):

Anomaly Detection ◽

Mobile Devices ◽

Service Oriented Architecture ◽

Malware Detection ◽

Detection Algorithm ◽

Main Task ◽

Detection Algorithms ◽

Usage Patterns ◽

Service Oriented ◽

And Behavior

This paper presents SmartMal—a novel service-oriented behavioral malware detection framework for vehicular and mobile devices. The highlight of SmartMal is to introduce service-oriented architecture (SOA) concepts and behavior analysis into the malware detection paradigms. The proposed framework relies on client-server architecture, the client continuously extracts various features and transfers them to the server, and the server’s main task is to detect anomalies using state-of-art detection algorithms. Multiple distributed servers simultaneously analyze the feature vector using various detectors and information fusion is used to concatenate the results of detectors. We also propose a cycle-based statistical approach for mobile device anomaly detection. We accomplish this by analyzing the users’ regular usage patterns. Empirical results suggest that the proposed framework and novel anomaly detection algorithm are highly effective in detecting malware on Android devices.

Download Full-text

Optimized Zero False Positives Perceptron Training for Malware Detection

2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing ◽

10.1109/synasc.2012.34 ◽

2012 ◽

Cited By ~ 18

Author(s):

Dragos Gavrilut ◽

Razvan Benchea ◽

Cristina Vatamanu

Keyword(s):

Malware Detection ◽

False Positives

Download Full-text

Database Tuning using Natural Language Processing

ACM SIGMOD Record ◽

10.1145/3503780.3503788 ◽

2021 ◽

Vol 50 (3) ◽

pp. 27-28

Author(s):

Immanuel Trummer

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Training Data ◽

Language Models ◽

Learning Approaches ◽

Training Samples ◽

Starting Point ◽

Training Cost ◽

Transformer Model

Introduction. We have seen significant advances in the state of the art in natural language processing (NLP) over the past few years [20]. These advances have been driven by new neural network architectures, in particular the Transformer model [19], as well as the successful application of transfer learning approaches to NLP [13]. Typically, training for specific NLP tasks starts from large language models that have been pre-trained on generic tasks (e.g., predicting obfuscated words in text [5]) for which large amounts of training data are available. Using such models as a starting point reduces task-specific training cost as well as the number of required training samples by orders of magnitude [7]. These advances motivate new use cases for NLP methods in the context of databases.

Download Full-text