Altering Deep Neural Networks: The One Pixel Attack Explained

Small Perturbations to Fool Deep Neural Networks

Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector.

In this article, we analyse an attack in an extremely limited scenario where only one pixel can be modified. For that a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE) was proposed. It requires less adversarial information (a blackbox attack) and can fool more types of networks due to the inherent features of DE.

Results of Vulnerability

The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset.

One Pixel Attack​

Several studies have revealed that artificial perturbations on natural images can easily make DNN misclassify and accordingly proposed effective algorithms for generating such samples called “adversarial images”. A common idea for creating adversarial images is adding a tiny amount of well-tuned additive perturbation, which is expected to be imperceptible to human eyes, to a correctly classified natural image. Such modification can cause the classifier to label the modified image as a completely different class.

The perturbation is conducted on about 4% of the total pixels and can be obvious to human eyes. Since the adversarial pixel perturbation has become a common way of generating adversarial images, such abnormal “noise” might be recognized with expertise.

Advantages of Each Approach

Effectiveness

On Kaggle CIFAR-10 dataset, being able to launch non-targeted attacks by only modifying one pixel on three common deep neural network structures with 68.71%, 71.66% and 63.53% success rates (higher success rates than the original CIFAR-10 dataset and ImageNet dataset). In addition, each natural image can be perturbed to 1.8, 2.1 and 1.5 other classes.

Semi Black-Box Attack

This approach only requires black-box feedback (probability labels). It does not require any inner information of target DNNs such as gradients and network structures.

Flexibility

This approach can be used to target more types of DNNs.

The Kaggle CIFAR-10 test dataset was used. The dataset contains 300,000 CIFAR-10 images which can be visually inspected to have the following modifications: duplication, rotation, clipping, blurring, adding few random bad pixels and so on.

3 types of common networks were trained : All convolution network, Network in Network and VGG16 network as target image classifiers on CIFAR-10 dataset. 

The network settings were kept as similar as possible to the original with a few modifications in order to get the highest classification accuracy.

Both the scenarios of targeted and non-targeted attacks are considered. For each of the attacks on the three types of neural networks 500 natural images are randomly selected from the Kaggle CIFAR-10 test dataset to conduct the attack.

An additional experiment is conducted on the all convolution network by generating 500 adversarial images with three and five pixel-modification. The objective is to compare one-pixel attack with three and five pixel attacks.

For each natural image, nine target attacks are launched trying to perturb it to the other 9 target classes. Overall, it leads to the total of 36,000 adversarial images created. 

Kaggle CIFAR-10

The Kaggle CIFAR-10 test dataset was used. The dataset contains 300,000 CIFAR-10 images which can be visually inspected to have the following modifications: duplication, rotation, clipping, blurring, adding few random bad pixels and so on.

3 types of common networks were trained : All convolution network, Network in Network and VGG16 network as target image classifiers on CIFAR-10 dataset. 

The network settings were kept as similar as possible to the original with a few modifications in order to get the highest classification accuracy.

Both the scenarios of targeted and non-targeted attacks are considered. For each of the attacks on the three types of neural networks 500 natural images are randomly selected from the Kaggle CIFAR-10 test dataset to conduct the attack.

An additional experiment is conducted on the all convolution network by generating 500 adversarial images with three and five pixel-modification. The objective is to compare one-pixel attack with three and five pixel attacks.

For each natural image, nine target attacks are launched trying to perturb it to the other 9 target classes. Overall, it leads to the total of 36,000 adversarial images created. 

Success Rates and Adversarial Probability Labels

Kaggle CIFAR-10

On Kaggle CIFAR-10, the success rates of one-pixel attacks on three types of networks show the generalised effectiveness of the proposed attack through different network structures. On average, each image can be perturbed to about two target classes for each network.

In addition, by increasing the number of pixels that can be modified to three and five, the number of target classes that can be reached also increases significantly.

By dividing the adversarial probability labels by the success rates, the confidence values are obtained which are 79.39%, 79.17% and 77.09% respectively to one, three and five-pixel attacks.

ImageNet

On ImageNet, the results show that the one pixel attack generalises well to large size images and fool the corresponding neural networks. There is a 16.04% chance that an arbitrary ImageNet test image can be perturbed to a target class with 22.91% confidence.

In each successful attack, the probability label of the target class is the highest. Therefore, the confidence of 22.91% is relatively low but tell us that the other remaining 999 classes are even lower to an almost uniform soft label distribution.

The low confidence is caused by the fact that a non-target evaluation that only focuses on decreasing the probability of the true class was utilised. Other fitness functions should give different results.

Number of Target Classes

The results have also shown that a fair amount of natural images can be perturbed to two, three and four target classes with only one-pixel modification. When the number of pixels modified is increased. perturbation to more target classes becomes highly probable.

This suggests that all three networks (AllConv network, NiN and VGG16) are vulnerable to this type of attack.

Implications for the Future

The DE utilized in this research belongs to a big class of algorithms called evolutionary strategies which includes other variants such as Adaptive DE and Covariance matrix adaptation evolution strategy.

There are a couple of recent developments in evolutionary strategies and related areas that could further improve the current method, allowing for more efficient and accurate attacks. Furthermore, evolutionary computation also provides some promising approaches to solve adversarial machine learning related vulnerabilities.

Besides, it can be seen that the one-pixel attack can be potentially extended to other domains such as natural language processing, speech recognition, which will be also left for future work.

References

Source: Su, J., Vargas, D. V., & Sakurai, K. (2019). Attacking convolutional neural network using differential evolution. IPSJ Transactions on Computer Vision and Applications11(1), [1].

Published

Share

Nested Technologies uses cookies to ensure you get the best experience.