Opportunistic Use of Crowdsourced Workers for Online Relabeling of Potential Adversarial Examples

<div>Deep Neural Networks (DDNs) have achieved tremendous success in handling various Machine Learning (ML) tasks, such as speech recognition, Natural Language Processing, and image classification. However, they have shown vulnerability to well-designed inputs called adversarial examples. Researchers in industry and academia have proposed many adversarial example defense techniques. However, none can provide complete robustness. The cutting-edge defense techniques offer partial reliability. Thus, complementing them with another layer of protection is a must, especially for mission-critical applications. This paper proposes a novel Online Selection and Relabeling Algorithm (OSRA) that opportunistically utilizes a limited number of crowdsourced workers (budget-constraint crowdsourcing) to maximize the ML system’s robustness. OSRA strives to use crowdsourced workers effectively by selecting the most suspicious inputs (the potential adversarial examples) and moving them to the crowdsourced workers to be validated and corrected (relabeled). As a result, the impact of adversarial examples gets reduced, and accordingly, the ML system becomes more robust. We also proposed a heuristic threshold selection method that contributes to enhancing the prediction system’s reliability. We empirically validated our proposed algorithm and found that it can efficiently and optimally utilize the allocated budget for crowdsourcing. It is also effectively integrated with a state-ofthe- art black-box (transfer-based) defense technique, resulting in a more robust system. Simulation results show that OSRA can outperform a random selection algorithm by 60% and achieve comparable performance to an optimal offline selection benchmark. They also show that OSRA’s performance has a positive correlation with system robustness.<br></div>

Download Full-text

Opportunistic Use of Crowdsourced Workers for Online Relabeling of Potential Adversarial Examples

10.36227/techrxiv.17088941 ◽

2021 ◽

Author(s):

Shawqi Al-Maliki ◽

Faissal El Bouanani ◽

Kashif Ahmad ◽

Mohamed Abdallah ◽

Dinh Hoang ◽

...

Keyword(s):

Language Processing ◽

Deep Neural Networks ◽

Black Box ◽

Threshold Selection ◽

Selection Algorithm ◽

Comparable Performance ◽

Adversarial Examples ◽

System Robustness ◽

Adversarial Example ◽

The Impact

Download Full-text

A Black-Box Approach to Generate Adversarial Examples Against Deep Neural Networks for High Dimensional Input

2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC) ◽

10.1109/dsc.2019.00078 ◽

2019 ◽

Author(s):

Chengru Song ◽

Changqiao Xu ◽

Shujie Yang ◽

Zan Zhou ◽

Changhui Gong

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Black Box ◽

High Dimensional ◽

Adversarial Examples

Download Full-text

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Applied Sciences ◽

10.3390/app10228079 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8079

Author(s):

Sanglee Park ◽

Jungmin So

Keyword(s):

Data Augmentation ◽

Black Box ◽

Training Data ◽

Model Parameters ◽

Neural Network Models ◽

Practical Applications ◽

Target Network ◽

Adversarial Examples ◽

Adversarial Training ◽

Adversarial Example

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Download Full-text

Random Untargeted Adversarial Example on Deep Neural Network

Symmetry ◽

10.3390/sym10120738 ◽

2018 ◽

Vol 10 (12) ◽

pp. 738 ◽

Cited By ~ 2

Author(s):

Hyun Kwon ◽

Yongchul Kim ◽

Hyunsoo Yoon ◽

Daeseon Choi

Keyword(s):

Autonomous Vehicles ◽

Deep Neural Networks ◽

Generation Process ◽

Research Attention ◽

Disease Diagnostics ◽

The Face ◽

Adversarial Examples ◽

Adversarial Example ◽

Original Class ◽

Minimum Distortion

Deep neural networks (DNNs) have demonstrated remarkable performance in machine learning areas such as image recognition, speech recognition, intrusion detection, and pattern analysis. However, it has been revealed that DNNs have weaknesses in the face of adversarial examples, which are created by adding a little noise to an original sample to cause misclassification by the DNN. Such adversarial examples can lead to fatal accidents in applications such as autonomous vehicles and disease diagnostics. Thus, the generation of adversarial examples has attracted extensive research attention recently. An adversarial example is categorized as targeted or untargeted. In this paper, we focus on the untargeted adversarial example scenario because it has a faster learning time and less distortion compared with the targeted adversarial example. However, there is a pattern vulnerability with untargeted adversarial examples: Because of the similarity between the original class and certain specific classes, it may be possible for the defending system to determine the original class by analyzing the output classes of the untargeted adversarial examples. To overcome this problem, we propose a new method for generating untargeted adversarial examples, one that uses an arbitrary class in the generation process. Moreover, we show that our proposed scheme can be applied to steganography. Through experiments, we show that our proposed scheme can achieve a 100% attack success rate with minimum distortion (1.99 and 42.32 using the MNIST and CIFAR10 datasets, respectively) and without the pattern vulnerability. Using a steganography test, we show that our proposed scheme can be used to fool humans, as demonstrated by the probability of their detecting hidden classes being equal to that of random selection.

Download Full-text

Generating Adversarial Examples with Adversarial Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/543 ◽

2018 ◽

Cited By ~ 65

Author(s):

Chaowei Xiao ◽

Bo Li ◽

Jun-yan Zhu ◽

Warren He ◽

Mingyan Liu ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Generative Adversarial Networks ◽

Perceptual Quality ◽

Small Magnitude ◽

Adversarial Networks ◽

Original Target ◽

Adversarial Examples ◽

Adversarial Training

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial exam- ples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply Adv- GAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

Download Full-text

Deepfake-Image Anti-Forensics with Adversarial Examples Attacks

Future Internet ◽

10.3390/fi13110288 ◽

2021 ◽

Vol 13 (11) ◽

pp. 288

Author(s):

Li Fan ◽

Wei Li ◽

Xiaohui Cui

Keyword(s):

State Of The Art ◽

Poisson Noise ◽

Future Research ◽

Detection Accuracy ◽

Generalization Performance ◽

Image Forensic ◽

Adversarial Examples ◽

Adversarial Example ◽

The Impact

Many deepfake-image forensic detectors have been proposed and improved due to the development of synthetic techniques. However, recent studies show that most of these detectors are not immune to adversarial example attacks. Therefore, understanding the impact of adversarial examples on their performance is an important step towards improving deepfake-image detectors. This study developed an anti-forensics case study of two popular general deepfake detectors based on their accuracy and generalization. Herein, we propose the Poisson noise DeepFool (PNDF), an improved iterative adversarial examples generation method. This method can simply and effectively attack forensics detectors by adding perturbations to images in different directions. Our attacks can reduce its AUC from 0.9999 to 0.0331, and the detection accuracy of deepfake images from 0.9997 to 0.0731. Compared with state-of-the-art studies, our work provides an important defense direction for future research on deepfake-image detectors, by focusing on the generalization performance of detectors and their resistance to adversarial example attacks.

Download Full-text

Local Post-hoc Explainable Methods for Adversarial Text Attacks

10.36227/techrxiv.17185568.v1 ◽

2021 ◽

Author(s):

Yidong Chai ◽

Ruicheng Liang ◽

Hongyi Zhu ◽

Sagar Samtani ◽

Meng Wang ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

Black Box ◽

Learning Models ◽

Two Phase ◽

Sensitivity Estimation ◽

Execution Phase ◽

Adversarial Examples ◽

Post Hoc

Deep learning models have significantly advanced various natural language processing tasks. However, they are strikingly vulnerable to adversarial text attacks, even in the black-box setting where no model knowledge is accessible to hackers. Such attacks are conducted with a two-phase framework: 1) a sensitivity estimation phase to evaluate each element’s sensitivity to the target model’s prediction, and 2) a perturbation execution phase to craft the adversarial examples based on estimated element sensitivity. This study explored the connections between the local post-hoc explainable methods for deep learning and black-box adversarial text attacks and proposed a novel eXplanation-based method for crafting Adversarial Text Attacks (XATA). XATA leverages local post-hoc explainable methods (e.g., LIME or SHAP) to measure input elements’ sensitivity and adopts the word replacement perturbation strategy to craft adversarial examples. We evaluated the attack performance of the proposed XATA on three commonly used text-based datasets: IMDB Movie Review, Yelp Reviews-Polarity, and Amazon Reviews-Polarity. The proposed XATA outperformed existing baselines in various target models, including LSTM, GRU, CNN, and BERT. Moreover, we found that improved local post-hoc explainable methods (e.g., SHAP) lead to more effective adversarial attacks. These findings showed that when researchers constantly advance the explainability of deep learning models with local post-hoc methods, they also provide hackers with weapons to craft more targeted and dangerous adversarial attacks.

Download Full-text

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12558 ◽

2021 ◽

Vol 72 ◽

pp. 1-37

Author(s):

Mike Wu ◽

Sonali Parbhoo ◽

Michael C. Hughes ◽

Volker Roth ◽

Finale Doshi-Velez

Keyword(s):

Neural Networks ◽

Predictive Power ◽

Deep Neural Networks ◽

Large Body ◽

Black Box ◽

New Family ◽

Deep Model ◽

Real World Applications ◽

Adversarial Examples ◽

Key Barrier

Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity – for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interpretability as a separate process after training. In this work, we take a different approach and explicitly regularize deep models so that they are well-approximated by processes that humans can step through in little time. Specifically, we train several families of deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. The resulting axis-aligned decision functions uniquely make tree regularized models easy for humans to interpret. Moreover, for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision tree in predefined, human-interpretable contexts. Using intuitive toy examples, benchmark image datasets, and medical tasks for patients in critical care and with HIV, we demonstrate that this new family of tree regularizers yield models that are easier for humans to simulate than L1 or L2 penalties without sacrificing predictive power.

Download Full-text

AdverseGen: A Practical Tool for Generating Adversarial Examples to Deep Neural Networks Using Black-Box Approaches

Lecture Notes in Computer Science - Artificial Intelligence XXXVIII ◽

10.1007/978-3-030-91100-3_25 ◽

2021 ◽

pp. 313-326

Author(s):

Keyuan Zhang ◽

Kaiyue Wu ◽

Siyu Chen ◽

Yunce Zhao ◽

Xin Yao

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Black Box ◽

Adversarial Examples ◽

Practical Tool

Download Full-text

GAN-based classifier protection against adversarial attacks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200280 ◽

2020 ◽

Vol 39 (5) ◽

pp. 7085-7095

Author(s):

Shuqi Liu ◽

Mingwen Shao ◽

Xinping Liu

Keyword(s):

Neural Network ◽

Deep Neural Networks ◽

Significant Progress ◽

Security Issue ◽

Generative Adversarial Network ◽

Adversarial Network ◽

The Neural Network ◽

Benchmark Datasets ◽

Adversarial Examples ◽

Adversarial Example

In recent years, deep neural networks have made significant progress in image classification, object detection and face recognition. However, they still have the problem of misclassification when facing adversarial examples. In order to address security issue and improve the robustness of the neural network, we propose a novel defense network based on generative adversarial network (GAN). The distribution of clean - and adversarial examples are matched to solve the mentioned problem. This guides the network to remove invisible noise accurately, and restore the adversarial example to a clean example to achieve the effect of defense. In addition, in order to maintain the classification accuracy of clean examples and improve the fidelity of neural network, we input clean examples into proposed network for denoising. Our method can effectively remove the noise of the adversarial examples, so that the denoised adversarial examples can be correctly classified. In this paper, extensive experiments are conducted on five benchmark datasets, namely MNIST, Fashion-MNIST, CIFAR10, CIFAR100 and ImageNet. Moreover, six mainstream attack methods are adopted to test the robustness of our defense method including FGSM, PGD, MIM, JSMA, CW and Deep-Fool. Results show that our method has strong defensive capabilities against the tested attack methods, which confirms the effectiveness of the proposed method.

Download Full-text