Attention‐guided transformation‐invariant attack for black‐box adversarial examples

A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5743 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3405-3413

Author(s):

Zhaohui Che ◽

Ali Borji ◽

Guangtao Zhai ◽

Suiyi Ling ◽

Jing Li ◽

...

Keyword(s):

Broad Class ◽

Black Box ◽

Security Threat ◽

Source Models ◽

Adversarial Examples ◽

Adversarial Attack ◽

Prediction Systems ◽

Attack And Defense ◽

Decision Boundaries

Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of pre-trained source models can transfer to other new target models, thus pose a security threat to black-box applications (when the attackers have no access to the target models). Despite adopting diverse architectures and parameters, source and target models often share similar decision boundaries. Therefore, if an adversary is capable of fooling several source models concurrently, it can potentially capture intrinsic transferable adversarial information that may allow it to fool a broad class of other black-box target models. Current ensemble attacks, however, only consider a limited number of source models to craft an adversary, and obtain poor transferability. In this paper, we propose a novel black-box attack, dubbed Serial-Mini-Batch-Ensemble-Attack (SMBEA). SMBEA divides a large number of pre-trained source models into several mini-batches. For each single batch, we design 3 new ensemble strategies to improve the intra-batch transferability. Besides, we propose a new algorithm that recursively accumulates the “long-term” gradient memories of the previous batch to the following batch. This way, the learned adversarial information can be preserved and the inter-batch transferability can be improved. Experiments indicate that our method outperforms state-of-the-art ensemble attacks over multiple pixel-to-pixel vision tasks including image translation and salient region prediction. Our method successfully fools two online black-box saliency prediction systems including DeepGaze-II (Kummerer 2017) and SALICON (Huang et al. 2017). Finally, we also contribute a new repository to promote the research on adversarial attack and defense over pixel-to-pixel tasks: https://github.com/CZHQuality/AAA-Pix2pix.

Download Full-text

A Black-Box Approach to Generate Adversarial Examples Against Deep Neural Networks for High Dimensional Input

2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC) ◽

10.1109/dsc.2019.00078 ◽

2019 ◽

Author(s):

Chengru Song ◽

Changqiao Xu ◽

Shujie Yang ◽

Zan Zhou ◽

Changhui Gong

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Black Box ◽

High Dimensional ◽

Adversarial Examples

Download Full-text

Enhancing adversarial attack transferability with multi-scale feature attack

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500769 ◽

2020 ◽

pp. 2050076

Author(s):

Caixia Sun ◽

Lian Zou ◽

Cien Fan ◽

Yu Shi ◽

Yifeng Liu

Keyword(s):

Internal Representation ◽

Feature Space ◽

Source Model ◽

Black Box ◽

Space Representation ◽

Scale Feature ◽

Multi Scale ◽

Adversarial Examples ◽

Fast Gradient ◽

Sign Method

Deep neural networks are vulnerable to adversarial examples, which can fool models by adding carefully designed perturbations. An intriguing phenomenon is that adversarial examples often exhibit transferability, thus making black-box attacks effective in real-world applications. However, the adversarial examples generated by existing methods typically overfit the structure and feature representation of the source model, resulting in a low success rate in a black-box manner. To address this issue, we propose the multi-scale feature attack to boost attack transferability, which adjusts the internal feature space representation of the adversarial image to get far to the internal representation of the original image. We show that we can select a low-level layer and a high-level layer of the source model to conduct the perturbations, and the crafted adversarial examples are confused with original images, not just in the class but also in the feature space representations. To further improve the transferability of adversarial examples, we apply reverse cross-entropy loss to reduce the overfitting further and show that it is effective for attacking adversarially trained models with strong defensive ability. Extensive experiments show that the proposed methods consistently outperform the iterative fast gradient sign method (IFGSM) and momentum iterative fast gradient sign method (MIFGSM) under the challenging black-box setting.

Download Full-text

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Applied Sciences ◽

10.3390/app10228079 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8079

Author(s):

Sanglee Park ◽

Jungmin So

Keyword(s):

Data Augmentation ◽

Black Box ◽

Training Data ◽

Model Parameters ◽

Neural Network Models ◽

Practical Applications ◽

Target Network ◽

Adversarial Examples ◽

Adversarial Training ◽

Adversarial Example

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Download Full-text

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-46147-8_43 ◽

2020 ◽

pp. 711-726

Author(s):

Prashanth Vijayaraghavan ◽

Deb Roy

Keyword(s):

Black Box ◽

Text Classifiers ◽

Adversarial Examples

Download Full-text

Generating Adversarial Examples with Adversarial Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/543 ◽

2018 ◽

Cited By ~ 65

Author(s):

Chaowei Xiao ◽

Bo Li ◽

Jun-yan Zhu ◽

Warren He ◽

Mingyan Liu ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Generative Adversarial Networks ◽

Perceptual Quality ◽

Small Magnitude ◽

Adversarial Networks ◽

Original Target ◽

Adversarial Examples ◽

Adversarial Training

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial exam- ples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply Adv- GAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

Download Full-text

Beware the Black-Box: On the Robustness of Recent Defenses to Adversarial Examples

Entropy ◽

10.3390/e23101359 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1359

Author(s):

Kaleel Mahmood ◽

Deniz Gurevin ◽

Marten van van Dijk ◽

Phuoung Ha Nguyen

Keyword(s):

Large Scale ◽

Error Correcting Codes ◽

Black Box ◽

Clear Picture ◽

Buffer Zones ◽

Adversarial Models ◽

Adversarial Examples ◽

Winner Take All ◽

The Relationship ◽

Take All

Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analyses of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security (<25%), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses.

Download Full-text

Local Post-hoc Explainable Methods for Adversarial Text Attacks

10.36227/techrxiv.17185568.v1 ◽

2021 ◽

Author(s):

Yidong Chai ◽

Ruicheng Liang ◽

Hongyi Zhu ◽

Sagar Samtani ◽

Meng Wang ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

Black Box ◽

Learning Models ◽

Two Phase ◽

Sensitivity Estimation ◽

Execution Phase ◽

Adversarial Examples ◽

Post Hoc

Deep learning models have significantly advanced various natural language processing tasks. However, they are strikingly vulnerable to adversarial text attacks, even in the black-box setting where no model knowledge is accessible to hackers. Such attacks are conducted with a two-phase framework: 1) a sensitivity estimation phase to evaluate each element’s sensitivity to the target model’s prediction, and 2) a perturbation execution phase to craft the adversarial examples based on estimated element sensitivity. This study explored the connections between the local post-hoc explainable methods for deep learning and black-box adversarial text attacks and proposed a novel eXplanation-based method for crafting Adversarial Text Attacks (XATA). XATA leverages local post-hoc explainable methods (e.g., LIME or SHAP) to measure input elements’ sensitivity and adopts the word replacement perturbation strategy to craft adversarial examples. We evaluated the attack performance of the proposed XATA on three commonly used text-based datasets: IMDB Movie Review, Yelp Reviews-Polarity, and Amazon Reviews-Polarity. The proposed XATA outperformed existing baselines in various target models, including LSTM, GRU, CNN, and BERT. Moreover, we found that improved local post-hoc explainable methods (e.g., SHAP) lead to more effective adversarial attacks. These findings showed that when researchers constantly advance the explainability of deep learning models with local post-hoc methods, they also provide hackers with weapons to craft more targeted and dangerous adversarial attacks.

Download Full-text

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12558 ◽

2021 ◽

Vol 72 ◽

pp. 1-37

Author(s):

Mike Wu ◽

Sonali Parbhoo ◽

Michael C. Hughes ◽

Volker Roth ◽

Finale Doshi-Velez

Keyword(s):

Neural Networks ◽

Predictive Power ◽

Deep Neural Networks ◽

Large Body ◽

Black Box ◽

New Family ◽

Deep Model ◽

Real World Applications ◽

Adversarial Examples ◽

Key Barrier

Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity – for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interpretability as a separate process after training. In this work, we take a different approach and explicitly regularize deep models so that they are well-approximated by processes that humans can step through in little time. Specifically, we train several families of deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. The resulting axis-aligned decision functions uniquely make tree regularized models easy for humans to interpret. Moreover, for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision tree in predefined, human-interpretable contexts. Using intuitive toy examples, benchmark image datasets, and medical tasks for patients in critical care and with HIV, we demonstrate that this new family of tree regularizers yield models that are easier for humans to simulate than L1 or L2 penalties without sacrificing predictive power.

Download Full-text

Adv-Makeup: A New Imperceptible and Transferable Attack on Face Recognition

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/173 ◽

2021 ◽

Author(s):

Bangjie Yin ◽

Wenxuan Wang ◽

Taiping Yao ◽

Junfeng Guo ◽

Zelun Kong ◽

...

Keyword(s):

Face Recognition ◽

Black Box ◽

Fine Grained ◽

Box Models ◽

Meta Learning ◽

Adversarial Examples ◽

Adversarial Attack ◽

Face Generation ◽

Orbital Region ◽

Black Box Models

Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples. However, existing adversarial examples against face recognition systems either lack transferability to black-box models, or fail to be implemented in practice. In this paper, we propose a unified adversarial face generation method - Adv-Makeup, which can realize imperceptible and transferable attack under the black-box setting. Adv-Makeup develops a task-driven makeup generation method with the blending module to synthesize imperceptible eye shadow over the orbital region on faces. And to achieve transferability, Adv-Makeup implements a fine-grained meta-learning based adversarial attack strategy to learn more vulnerable or sensitive features from various models. Compared to existing techniques, sufficient visualization results demonstrate that Adv-Makeup is capable to generate much more imperceptible attacks under both digital and physical scenarios. Meanwhile, extensive quantitative experiments show that Adv-Makeup can significantly improve the attack success rate under black-box setting, even attacking commercial systems.

Download Full-text