# Computing Flux.gradient change the model

I don’t speak English well, so sorry for the mistakes.
I have an exercise of differential programming.
I have an image taken from MNIST and a fully connected NN that classifies correctly that image.
Then I want to modify that image with differential programming until it is classified as 0.

``````using Flux
using BSON
using ForwardDiff

m[:modello]
end

images = hcat([Float32.(i)[:] for i in Flux.Data.MNIST.images()]...)  # 784*60_000
labels = Flux.onehotbatch(Flux.Data.MNIST.labels(), 0:9) .|> Float32  # 10 *60_000
images, labels
end

function accuracy(a, b)
prev(x) = argmax(x) - 1
L = size(a)
right = 0
wrong = 0
for i in 1:L
@inbounds if prev(a[:, i]) == prev(b[:, i])
right += 1
else
wrong += 1
end
end
100right / (wrong + right)
end

images, labels = load_data_mnist() # 784*60_000, 10 *60_000, don't need train/dev/test
model_accuracy = accuracy(model(images), labels) # 96.7%

index_image = 10
z = images[:, index_image:index_image] # image that will be classified as zero (4->0)

# check if the model classifies it correctly
labels[:, index_image] |> println # 4
model(z)               |> println # the model classifies the image as 4

zero_label = reshape([1, 0, 0, 0, 0, 0, 0, 0, 0, 0], (10, 1))

loss(x) = Flux.mse(model(x), zero_label)
loss(z) |> println

for i in 1:100
∂loss = ForwardDiff.gradient(loss, z) # this work
Flux.update!(opt, z, ∂loss)
end

model(z) |> println # the model classifies the modified image as 0
``````

If I use Flux.gradient the program does not work and after the first call of Flux.gradient(loss) the model is broken

``````for i in 1:100
∂loss =  Flux.gradient(loss, z)  # this does not work and change the model
Flux.update!(opt, z, ∂loss)
end

# if i test model accuracy I get
accuracy(model(images), labels) |> println # 9.8%
loss(z) |> println # NaN32
model(images) |> println # a matrix full of NaN, the model is broken
``````

I don’t expect it.

This should be `∂loss = Flux.gradient(loss, z)` (no indexing).

Recall that `Flux.gradient` returns a “bag” of gradients (`Grads` struct) when you pass it a bag of model parameters (`Params`, which is what `Flux.params` returns). This can only be indexed by using the original parameter arrays because it uses object ids as keys. Writing `∂loss` doesn’t make any sense because it’s indexing into the params at some garbage key that may or may not exist. It’s like reading from an chunk of uninitialized memory: you may get something back, but it could be/do anything.

TL;DR `Flux.update!` already handles everything for you, never index into the result of `gradient` with some arbitrary value if you use `params`.

2 Likes

Thanks for the answer, but I did not understand.
In general if I have a function and I call the Flux.gradient on that function I get back a tuple containing the partial derivatives relative to the parameter of the function. Is that correct?
If I remove the index the code is broken.

However, if I change the code in this way (like the code here:Optimisers · Flux):

``````θ = Flux.params(z)

for i in 1:100
end
``````

the model is broken after

``````grads=Flux.gradient(() -> loss(z), θ)
``````

Why does this happen?

Ah, I misread part of the original code. `Flux.gradient(loss, z)` was only returning gradients with respect to the input `z`, whereas you want them in terms of the parameters `θ`.

This is incorrect. You’re still taking gradients with respect to the input `z` instead of the model parameters. All this will do is update `z` 100 times to some nonsense value while leaving the model parameters unchanged.

If you look further down on that page, there’s a snippet of how to correctly call `update!` with `Params`. Adapted slightly:

``````θ = Flux.params(model) # note: not z, model
Flux.update!(opt, θ, grads) # note: not z, θ
``````

Not only does it work, it’s less code to write!

1 Like

Also your code broke the model on my pc. I don’t want to modify the model.

Recap:
I have one image from MNIST and a fully connected model saved on disk that classifies correctly that image.
I want to modify that image so that the model misclassifies it as a 0.
I compute the gradient of the loss respect to the image and then update the image itself.

I would like to do this with Flux

``````index_image = 10
z = images[:, index_image:index_image] # a 4 classified as 4 by the model
zero_label = reshape([1, 0, 0, 0, 0, 0, 0, 0, 0, 0], (10, 1))

loss(x) = Flux.mse(model(x), zero_label)

for i in 1:100
∂loss = ForwardDiff.gradient(loss, z) # compute the gradient of loss respect to the image "z"
Flux.update!(opt, z, ∂loss)                 # update the image itself until the model classifies it as 0
end

model(z) |> println # now classified as 0 despite being a modified 4
``````

How can i do it?

I don’t speak English and I don’t know if my English is understandable, sorry for the mistakes.

Your English is fine, but you’re going to have to provide some more information to properly troubleshoot this . We can’t do anything with just “your code broke the model on my pc”.

1. The error and stacktrace you get (i.e. what julia outputs when the model is “broken”)
2. The version of Flux you’re using
3. The structure of your model. This should be as easy as copying the output of `println(model)`. You probably won’t have to change it, but it’s hard to debug without actually seeing what it is.
2 Likes

I don’t get an error from julia. If I run the program, it will terminate without any error.

I noticed that with ForwardDiff.gradient the image (in this case a 4) is modified correctly until it is classified as 0.

``````for i in 1:100
Flux.update!(opt, z, ∂loss)
end

model(z) |> println # now the model classifies the modified image as a 0
``````

If I put a loss(z) |> println in the for loop I see that the loss decrease

If I try to do the same thing with Flux.gradient I get NaN32 from the loss (after the first evaluation of the gradient).

``````for i in 1:100
∂loss = Flux.gradient(loss, z)  # the gradient of the loss respect to the first parameter (the image z)
loss(z) |> println # NaN32 after first evaluation of ∂loss
Flux.update!(opt, z, ∂loss)
end
``````

If i put at the end of the code this

``````model(images) |> println  # matrix full of NaN32
accuracy(model(images), labels) |> println # 9.8%, before was 96.7%
loss(z) |> println  # NaN32
``````

For same reason the evaluation of the gradient of the loss respect to the image z changes the NN.

I’m using julia 1.5.3.
The model saved on the disk is a very simple pretrained fully connected NN:
Chain(Dense(784, 200),
BatchNorm(200, λ = relu),
Dropout(0.3),
Dense(200, 10),
softmax)

I’ve experienced a similar issue (while computing gradient with respect to input), and in my case I think it was due to this bug: calling `softmax` (on a `CuArray`) updates its argument in place.

I’m not sure it’s exactly the same problem, but maybe you’re hitting a similar problem. Could you try and see if things work as expected on the CPU?

I have not a nvidia gpu.
All is running on the cpu (ryzen 5 3500u).

edit: I have tried another model without softmax, i got the same problem
the new model is: Chain(Dense(784, 100), BatchNorm(100, λ = relu), Dropout(0.3), Dense(100, 10), my_activation)

where my_activation is:

``````function my_activation(x)
k = abs.(x)
k./sum(k, dims=1)
end
``````

after the first evaluation of the gradient respect to the image z the model is broken and the loss(z) is NaN32

If I put this instruction after loading the model (Model Reference · Flux):

``````Flux.testmode!(model)
``````

this finally works:

``````for i in 1:100
By default, `BatchNorm` will update its internal statistics when called in the context of `gradient`. As you noted, `Flux.testmode!` fixes this because it will freeze all normalization layers and prevent this updating even when gradients are being taken.