Adversarial EXEmples

Recent work has shown that adversarial Windows malware samples—referred to as adversarial EXE mples in this article—can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable file format. These attacks, named Full DOS , Extend , and Shift , inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better tradeoff in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

Download Full-text

Analysis of Operation Performance of Blast Furnace With Machine Learning Methods

Advances in Business Information Systems and Analytics - Utilizing Big Data Paradigms for Business Intelligence ◽

10.4018/978-1-5225-4963-5.ch008 ◽

2019 ◽

pp. 242-269

Author(s):

Kuo-Wei Hsu ◽

Yung-Chang Ko

Keyword(s):

Machine Learning ◽

Blast Furnace ◽

Domain Knowledge ◽

Black Box ◽

Operation Performance ◽

Learning Methods ◽

Fundamental Feature ◽

Technical Performance ◽

Machine Learning Methods ◽

Multiple Reactions

Although its theoretical foundation is well understood by researchers, a blast furnace is like a black box in practice because its behavior is not always as expected. It is a complex reactor where multiple reactions and multiple phases are involved, and the operation heavily relies on the operators' experience. In order to help the operators gain insights into the operation, the authors do not use traditional metallurgy models but instead use machine learning methods to analyze the data associated with the operation performance of a blast furnace. They analyze the variables that are connected to the economic and technical performance indices by combining domain knowledge and results obtained from two fundamental feature selection methods, and they propose a classification algorithm to train classifiers for the prediction of the operation performance. The findings could assist the operators in reviewing as well as improving the guideline for the operation.

Download Full-text

Machine Learning based Static Code Analysis for Software Quality Assurance

2018 Thirteenth International Conference on Digital Information Management (ICDIM) ◽

10.1109/icdim.2018.8847079 ◽

2018 ◽

Cited By ~ 2

Author(s):

Eldar Sultanow ◽

Andre Ullrich ◽

Stefan Konopik ◽

Gergana Vladova

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Software Quality ◽

Software Quality Assurance ◽

Code Analysis ◽

Static Code Analysis

Download Full-text

Combining Static Code Analysis and Machine Learning for Automatic Detection of Security Vulnerabilities in Mobile Apps

Application Development and Design ◽

10.4018/978-1-5225-3422-8.ch047 ◽

2018 ◽

pp. 1121-1147

Author(s):

Marco Pistoia ◽

Omer Tripp ◽

David Lubensky

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Private Information ◽

Program Analysis ◽

Mobile Applications ◽

Mobile Apps ◽

Security Vulnerabilities ◽

Code Analysis ◽

Analysis Tools ◽

Static Code Analysis

Mobile devices have revolutionized many aspects of our lives. Without realizing it, we often run on them programs that access and transmit private information over the network. Integrity concerns arise when mobile applications use untrusted data as input to security-sensitive computations. Program-analysis tools for integrity and confidentiality enforcement have become a necessity. Static-analysis tools are particularly attractive because they do not require installing and executing the program, and have the potential of never missing any vulnerability. Nevertheless, such tools often have high false-positive rates. In order to reduce the number of false positives, static analysis has to be very precise, but this is in conflict with the analysis' performance and scalability, requiring a more refined model of the application. This chapter proposes Phoenix, a novel solution that combines static analysis with machine learning to identify programs exhibiting suspicious operations. This approach has been widely applied to mobile applications obtaining impressive results.

Download Full-text

Combining Static Code Analysis and Machine Learning for Automatic Detection of Security Vulnerabilities in Mobile Apps

Mobile Application Development, Usability, and Security - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-0945-5.ch004 ◽

2017 ◽

pp. 68-94

Author(s):

Marco Pistoia ◽

Omer Tripp ◽

David Lubensky

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Private Information ◽

Program Analysis ◽

Mobile Applications ◽

Mobile Apps ◽

Security Vulnerabilities ◽

Code Analysis ◽

Analysis Tools ◽

Static Code Analysis

Download Full-text

Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings

2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa.2018.8612819 ◽

2018 ◽

Cited By ~ 1

Author(s):

Enas A. Alikhashashneh ◽

Rajeev R. Raje ◽

James H. Hill

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Analysis Tool ◽

Code Analysis ◽

Static Code Analysis ◽

Learning Techniques

Download Full-text

Automated Static Code Analysis for Classifying Android Applications Using Machine Learning

2010 International Conference on Computational Intelligence and Security ◽

10.1109/cis.2010.77 ◽

2010 ◽

Cited By ~ 77

Author(s):

Asaf Shabtai ◽

Yuval Fledel ◽

Yuval Elovici

Keyword(s):

Machine Learning ◽

Code Analysis ◽

Static Code Analysis ◽

Android Applications

Download Full-text

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

Proceedings of the Workshop on Human-In-the-Loop Data Analytics - HILDA'19 ◽

10.1145/3328519.3329126 ◽

2019 ◽

Author(s):

Sergey Redyuk ◽

Sebastian Schelter ◽

Tammo Rukat ◽

Volker Markl ◽

Felix Biessmann

Keyword(s):

Machine Learning ◽

Black Box ◽

Learning Models ◽

Unseen Data ◽

Machine Learning Models

Download Full-text

Analysis of the Tools for Static Code Analysis

2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH) ◽

10.1109/infoteh51037.2021.9400688 ◽

2021 ◽

Author(s):

Danilo Nikolic ◽

Darko Stefanovic ◽

Dusanka Dakic ◽

Srdan Sladojevic ◽

Sonja Ristic

Keyword(s):

Code Analysis ◽

Static Code Analysis

Download Full-text

MODES: model-based optimization on distributed embedded systems

Machine Learning ◽

10.1007/s10994-021-06014-6 ◽

2021 ◽

Author(s):

Junjie Shi ◽

Jiang Bian ◽

Jakob Richter ◽

Kuan-Hsun Chen ◽

Jörg Rahnenführer ◽

...

Keyword(s):

Machine Learning ◽

Embedded Systems ◽

Learning Model ◽

Black Box ◽

Distributed Embedded Systems ◽

Data Set ◽

Individual Model ◽

Model Based ◽

Machine Learning Model ◽

Distributed Machine Learning

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.

Download Full-text

How to fool a black box machine learning based side-channel security evaluation

Cryptography and Communications ◽

10.1007/s12095-021-00479-x ◽

2021 ◽

Author(s):

Charles-Henry Bertrand Van Ouytsel ◽

Olivier Bronchain ◽

Gaëtan Cassiers ◽

François-Xavier Standaert

Keyword(s):

Machine Learning ◽

Black Box ◽

Security Evaluation ◽

Side Channel

Download Full-text