Random Transformation of image brightness for adversarial attack

Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding small, human-imperceptible perturbations to the original images, but make the model output inaccurate predictions. Before DNNs are deployed, adversarial attacks can thus be an important method to evaluate and select robust models in safety-critical applications. However, under the challenging black-box setting, the attack success rate, i.e., the transferability of adversarial examples, still needs to be improved. Based on image augmentation methods, this paper found that random transformation of image brightness can eliminate overfitting in the generation of adversarial examples and improve their transferability. In light of this phenomenon, this paper proposes an adversarial example generation method, which can be integrated with Fast Gradient Sign Method (FGSM)-related methods to build a more robust gradient-based attack and to generate adversarial examples with better transferability. Extensive experiments on the ImageNet dataset have demonstrated the effectiveness of the aforementioned method. Whether on normally or adversarially trained networks, our method has a higher success rate for black-box attacks than other attack methods based on data augmentation. It is hoped that this method can help evaluate and improve the robustness of models.

Download Full-text

Boosting Adversarial Attacks on Neural Networks with Better Optimizer

Security and Communication Networks ◽

10.1155/2021/9983309 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Heng Yin ◽

Hengwei Zhang ◽

Jindong Wang ◽

Ruiyu Dou

Keyword(s):

Neural Networks ◽

Success Rate ◽

Gradient Descent ◽

State Of The Art ◽

Black Box ◽

Security Threats ◽

Gradient Descent Algorithm ◽

Gradient Based ◽

Adversarial Examples ◽

Fast Gradient

Convolutional neural networks have outperformed humans in image recognition tasks, but they remain vulnerable to attacks from adversarial examples. Since these data are crafted by adding imperceptible noise to normal images, their existence poses potential security threats to deep learning systems. Sophisticated adversarial examples with strong attack performance can also be used as a tool to evaluate the robustness of a model. However, the success rate of adversarial attacks can be further improved in black-box environments. Therefore, this study combines a modified Adam gradient descent algorithm with the iterative gradient-based attack method. The proposed Adam iterative fast gradient method is then used to improve the transferability of adversarial examples. Extensive experiments on ImageNet showed that the proposed method offers a higher attack success rate than existing iterative methods. By extending our method, we achieved a state-of-the-art attack success rate of 95.0% on defense models.

Download Full-text

Improving the Transferability of Adversarial Examples With a Noise Data Enhancement Framework and Random Erasing

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.784053 ◽

2021 ◽

Vol 15 ◽

Author(s):

Pengfei Xie ◽

Shuhao Shi ◽

Shuai Yang ◽

Kai Qiao ◽

Ningning Liang ◽

...

Keyword(s):

Success Rate ◽

Deep Neural Networks ◽

Black Box ◽

Excellent Performance ◽

Training Models ◽

Noise Data ◽

Adversarial Examples ◽

Adversarial Training ◽

Fast Gradient ◽

Sign Method

Deep neural networks (DNNs) are proven vulnerable to attack against adversarial examples. Black-box transfer attacks pose a massive threat to AI applications without accessing target models. At present, the most effective black-box attack methods mainly adopt data enhancement methods, such as input transformation. Previous data enhancement frameworks only work on input transformations that satisfy accuracy or loss invariance. However, it does not work for other transformations that do not meet the above conditions, such as the transformation which will lose information. To solve this problem, we propose a new noise data enhancement framework (NDEF), which only transforms adversarial perturbation to avoid the above issues effectively. In addition, we introduce random erasing under this framework to prevent the over-fitting of adversarial examples. Experimental results show that the black-box attack success rate of our method Random Erasing Iterative Fast Gradient Sign Method (REI-FGSM) is 4.2% higher than DI-FGSM in six models on average and 6.6% higher than DI-FGSM in three defense models. REI-FGSM can combine with other methods to achieve excellent performance. The attack performance of SI-FGSM can be improved by 22.9% on average when combined with REI-FGSM. Besides, our combined version with DI-TI-MI-FGSM, i.e., DI-TI-MI-REI-FGSM can achieve an average attack success rate of 97.0% against three ensemble adversarial training models, which is greater than the current gradient iterative attack method. We also introduce Gaussian blur to prove the compatibility of our framework.

Download Full-text

Enhancing adversarial attack transferability with multi-scale feature attack

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500769 ◽

2020 ◽

pp. 2050076

Author(s):

Caixia Sun ◽

Lian Zou ◽

Cien Fan ◽

Yu Shi ◽

Yifeng Liu

Keyword(s):

Internal Representation ◽

Feature Space ◽

Source Model ◽

Black Box ◽

Space Representation ◽

Scale Feature ◽

Multi Scale ◽

Adversarial Examples ◽

Fast Gradient ◽

Sign Method

Deep neural networks are vulnerable to adversarial examples, which can fool models by adding carefully designed perturbations. An intriguing phenomenon is that adversarial examples often exhibit transferability, thus making black-box attacks effective in real-world applications. However, the adversarial examples generated by existing methods typically overfit the structure and feature representation of the source model, resulting in a low success rate in a black-box manner. To address this issue, we propose the multi-scale feature attack to boost attack transferability, which adjusts the internal feature space representation of the adversarial image to get far to the internal representation of the original image. We show that we can select a low-level layer and a high-level layer of the source model to conduct the perturbations, and the crafted adversarial examples are confused with original images, not just in the class but also in the feature space representations. To further improve the transferability of adversarial examples, we apply reverse cross-entropy loss to reduce the overfitting further and show that it is effective for attacking adversarially trained models with strong defensive ability. Extensive experiments show that the proposed methods consistently outperform the iterative fast gradient sign method (IFGSM) and momentum iterative fast gradient sign method (MIFGSM) under the challenging black-box setting.

Download Full-text

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Applied Sciences ◽

10.3390/app10228079 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8079

Author(s):

Sanglee Park ◽

Jungmin So

Keyword(s):

Data Augmentation ◽

Black Box ◽

Training Data ◽

Model Parameters ◽

Neural Network Models ◽

Practical Applications ◽

Target Network ◽

Adversarial Examples ◽

Adversarial Training ◽

Adversarial Example

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Download Full-text

A Hybrid Adversarial Attack for Different Application Scenarios

Applied Sciences ◽

10.3390/app10103559 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3559 ◽

Cited By ~ 1

Author(s):

Xiaohu Du ◽

Jie Yu ◽

Zibo Yi ◽

Shasha Li ◽

Jun Ma ◽

...

Keyword(s):

Deep Learning ◽

Success Rate ◽

Black Box ◽

De Algorithm ◽

Word Level ◽

Text Readability ◽

Gradient Based ◽

Adversarial Examples ◽

Adversarial Attack ◽

Cosine Distance

Adversarial attack against natural language has been a hot topic in the field of artificial intelligence security in recent years. It is mainly to study the methods and implementation of generating adversarial examples. The purpose is to better deal with the vulnerability and security of deep learning systems. According to whether the attacker understands the deep learning model structure, the adversarial attack is divided into black-box attack and white-box attack. In this paper, we propose a hybrid adversarial attack for different application scenarios. Firstly, we propose a novel black-box attack method of generating adversarial examples to trick the word-level sentiment classifier, which is based on differential evolution (DE) algorithm to generate semantically and syntactically similar adversarial examples. Compared with existing genetic algorithm based adversarial attacks, our algorithm can achieve a higher attack success rate while maintaining a lower word replacement rate. At the 10% word substitution threshold, we have increased the attack success rate from 58.5% to 63%. Secondly, when we understand the model architecture and parameters, etc., we propose a white-box attack with gradient-based perturbation against the same sentiment classifier. In this attack, we use a Euclidean distance and cosine distance combined metric to find the most semantically and syntactically similar substitution, and we introduce the coefficient of variation (CV) factor to control the dispersion of the modified words in the adversarial examples. More dispersed modifications can increase human imperceptibility and text readability. Compared with the existing global attack, our attack can increase the attack success rate and make modification positions in generated examples more dispersed. We’ve increased the global search success rate from 75.8% to 85.8%. Finally, we can deal with different application scenarios by using these two attack methods, that is, whether we understand the internal structure and parameters of the model, we can all generate good adversarial examples.

Download Full-text

Generate Adversarial Examples by Nesterov-momentum Iterative Fast Gradient Sign Method

2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess49938.2020.9237700 ◽

2020 ◽

Author(s):

Jin Xu

Keyword(s):

Adversarial Examples ◽

Fast Gradient ◽

Sign Method

Download Full-text

Argot: Generating Adversarial Readable Chinese Texts

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/351 ◽

2020 ◽

Author(s):

Zihan Zhang ◽

Mingxuan Liu ◽

Chao Zhang ◽

Yiming Zhang ◽

Zhou Li ◽

...

Keyword(s):

Image Processing ◽

Natural Language Processing ◽

Success Rate ◽

Language Processing ◽

Black Box ◽

Essential Step ◽

Chinese Characteristics ◽

Adversarial Examples ◽

Chinese Texts ◽

Chinese And English

Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models. However, existing studies mainly focus on analyzing English texts and generating adversarial examples for English texts. There is no work studying the possibility and effect of the transformation to another language, e.g, Chinese. In this paper, we analyze the differences between Chinese and English, and explore the methodology to transform the existing English adversarial generation method to Chinese. We propose a novel black-box adversarial Chinese texts generation solution Argot, by utilizing the method for adversarial English samples and several novel methods developed on Chinese characteristics. Argot could effectively and efficiently generate adversarial Chinese texts with good readability. Furthermore, Argot could also automatically generate targeted Chinese adversarial text, achieving a high success rate and ensuring readability of the Chinese.

Download Full-text

Generating adversarial examples without specifying a target model

PeerJ Computer Science ◽

10.7717/peerj-cs.702 ◽

2021 ◽

Vol 7 ◽

pp. e702

Author(s):

Gaoming Yang ◽

Mingwei Li ◽

Xianjing Fang ◽

Ji Zhang ◽

Xingzhu Liang

Keyword(s):

Deep Learning ◽

Success Rate ◽

Black Box ◽

Time Cost ◽

Learning Models ◽

Security Threat ◽

Practical Situation ◽

Data Set ◽

Target Model ◽

Adversarial Examples

Adversarial examples are regarded as a security threat to deep learning models, and there are many ways to generate them. However, most existing methods require the query authority of the target during their work. In a more practical situation, the attacker will be easily detected because of too many queries, and this problem is especially obvious under the black-box setting. To solve the problem, we propose the Attack Without a Target Model (AWTM). Our algorithm does not specify any target model in generating adversarial examples, so it does not need to query the target. Experimental results show that it achieved a maximum attack success rate of 81.78% in the MNIST data set and 87.99% in the CIFAR-10 data set. In addition, it has a low time cost because it is a GAN-based method.

Download Full-text

Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6356 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8384-8391

Author(s):

Hui Liu ◽

Yongzheng Zhang ◽

Yipeng Wang ◽

Zheng Lin ◽

Yige Chen

Keyword(s):

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Word Embedding ◽

Data Sets ◽

Basic Task ◽

Gradient Based ◽

Adversarial Examples ◽

Stability Training ◽

Adversarial Example

Text classification is a basic task in natural language processing, but the small character perturbations in words can greatly decrease the effectiveness of text classification models, which is called character-level adversarial example attack. There are two main challenges in character-level adversarial examples defense, which are out-of-vocabulary words in word embedding model and the distribution difference between training and inference. Both of these two challenges make the character-level adversarial examples difficult to defend. In this paper, we propose a framework which jointly uses the character embedding and the adversarial stability training to overcome these two challenges. Our experimental results on five text classification data sets show that the models based on our framework can effectively defend character-level adversarial examples, and our models can defend 93.19% gradient-based adversarial examples and 94.83% natural adversarial examples, which outperforms the state-of-the-art defense models.

Download Full-text

Opportunistic Use of Crowdsourced Workers for Online Relabeling of Potential Adversarial Examples

10.36227/techrxiv.17088941.v1 ◽

2021 ◽

Author(s):

Shawqi Al-Maliki ◽

Faissal El Bouanani ◽

Kashif Ahmad ◽

Mohamed Abdallah ◽

Dinh Hoang ◽

...

Keyword(s):

Language Processing ◽

Deep Neural Networks ◽

Black Box ◽

Threshold Selection ◽

Selection Algorithm ◽

Comparable Performance ◽

Adversarial Examples ◽

System Robustness ◽

Adversarial Example ◽

The Impact

<div>Deep Neural Networks (DDNs) have achieved tremendous success in handling various Machine Learning (ML) tasks, such as speech recognition, Natural Language Processing, and image classification. However, they have shown vulnerability to well-designed inputs called adversarial examples. Researchers in industry and academia have proposed many adversarial example defense techniques. However, none can provide complete robustness. The cutting-edge defense techniques offer partial reliability. Thus, complementing them with another layer of protection is a must, especially for mission-critical applications. This paper proposes a novel Online Selection and Relabeling Algorithm (OSRA) that opportunistically utilizes a limited number of crowdsourced workers (budget-constraint crowdsourcing) to maximize the ML system’s robustness. OSRA strives to use crowdsourced workers effectively by selecting the most suspicious inputs (the potential adversarial examples) and moving them to the crowdsourced workers to be validated and corrected (relabeled). As a result, the impact of adversarial examples gets reduced, and accordingly, the ML system becomes more robust. We also proposed a heuristic threshold selection method that contributes to enhancing the prediction system’s reliability. We empirically validated our proposed algorithm and found that it can efficiently and optimally utilize the allocated budget for crowdsourcing. It is also effectively integrated with a state-ofthe- art black-box (transfer-based) defense technique, resulting in a more robust system. Simulation results show that OSRA can outperform a random selection algorithm by 60% and achieve comparable performance to an optimal offline selection benchmark. They also show that OSRA’s performance has a positive correlation with system robustness.<br></div>

Download Full-text