Abstract

We show that adversarial examples, i.e. the visually imperceptible perturbations that result in Convolutional Neural Networks (CNN) fail, can be alleviated with a mechanism based on foveations -applying the CNN in a different image region. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting too linearly: a CNN acts locally linearly, only to changes in the receptive fields with objects recognized by the CNN, otherwise the CNN acts non-linearly. Then, we corroborate the hypothesis that when the neural responses are in the linear region, applying the foveation mechanism to the adversarial example tends to reduce the effect of the perturbation. This is because CNNs in ImageNet are robust to changes produced by the foveation (scale and translation of the recognized objects), but this property does not generalize to transformations of the perturbation.

Resources

Paper

Yan Luo, Xavier Boix, Gemma Roig, Tomaso A. Poggio and Qi Zhao, "Foveation-based Mechanisms Alleviate Adversarial Examples," ICLR 2016, under review. [pdf] [bib]

Example of different CNNs' minimum perturbations

Each row corresponds to a different CNN. (a) BFGS, (b) its corresponding adversarial example, and (c) the adversarial example with the perturbation multiplied by 10; (d) Sign, (e) and (f) the same as (b) and (c), respectively, for Sign. For the details of the BFGS and Sign perturbation, please refer to the paper.

Review of the properties of adversarial examples

Accuracy when Varying the L1 Norm per Pixel of the Perturbation. Accuracy for the three CNNs we evaluate. We denote the perturbation as X Y , where X is the network that generated the perturbation -- ALX (AlexNet), GNT (GoogLeNet), VGG -- and Y indicates the BFGS perturbation or Sign.

Role of the target object location in the perturbation

Accuracy of the masked perturbations. Accuracy for the three CNNs we evaluate, when varying the norm of BFGS.

CNNs act locally linearly on the location of a recognized object

Local Linearity of the CNNs. Classification score of the ground truth object category for the image in the aforementioned figure. $w'$ is the tranpose of weights, $\epsilon$ is the perturbation ( $\epsilon^{\star}$ is the optimal perturbation), and $x$ is an image. $f$ is an CNN model. We can see that, when we increase the norm of the perturbation, the effect of the perturbation to the final classification score, $f(x+\epsilon^{\star})-f(x)$ , stops increasing at the same linear pace, because the number of ReLUs that return a 0 value is higher than before increasing the norm of the perturbation.

Foveation-based mechanisms alleviate adversarial examples

Accuracy of the Foveation Mechanisms. Accuracy for the three CNNs we evaluate, when varying the $L_{1}$ norm per pixel of BFGS. In the following figure, we report the accuracy using different values of the norm of the perturbation before the foveation. All the foveation mechanisms we introduce improve the accuracy between $30\%\sim 40\%$ . For the details of experimental set-up, please refer to the paper.

Evaluation of the Foveation Mechanisms. Increase of the norm of the perturbation to produce misclassification after the foveation mechanisms. Only the images that are correctly classified before and after the foveation are included.

Effect of foveation $T(\cdot)$ in the classification scores of the perturbation and the target object.

Effect of the Foveation to the Perturbations. Cumulative histogram of the number of images with a change of classification score smaller than the indicated in the horizontal axis. $T_{B}(\cdot)$ is the foveation with Object Crop MP, and $T_{S}(\cdot)$ is the foveation with 1 Crop MP-Object.