Computing Flux.gradient change the model

I don’t speak English well, so sorry for the mistakes.
I have an exercise of differential programming.
I have an image taken from MNIST and a fully connected NN that classifies correctly that image.
Then I want to modify that image with differential programming until it is classified as 0.

The program works with ForwardDiff.gradient

using Flux
using BSON
using ForwardDiff

function load_model(path_to_model)
    m = BSON.load(path_to_model)
    m[:modello]
end

function load_data_mnist()
    images = hcat([Float32.(i)[:] for i in Flux.Data.MNIST.images()]...)  # 784*60_000
    labels = Flux.onehotbatch(Flux.Data.MNIST.labels(), 0:9) .|> Float32  # 10 *60_000
    images, labels
end

function accuracy(a, b)
    prev(x) = argmax(x) - 1
    L = size(a)[2]
    right = 0
    wrong = 0
    for i in 1:L
        @inbounds if prev(a[:, i]) == prev(b[:, i])
            right += 1
        else
            wrong += 1
        end
    end
    100right / (wrong + right)
end

model = load_model("model.bson")
images, labels = load_data_mnist() # 784*60_000, 10 *60_000, don't need train/dev/test
model_accuracy = accuracy(model(images), labels) # 96.7%

index_image = 10
z = images[:, index_image:index_image] # image that will be classified as zero (4->0)

# check if the model classifies it correctly
labels[:, index_image] |> println # 4
model(z)               |> println # the model classifies the image as 4

zero_label = reshape([1, 0, 0, 0, 0, 0, 0, 0, 0, 0], (10, 1))

opt = ADAM()
loss(x) = Flux.mse(model(x), zero_label)
loss(z) |> println

for i in 1:100
    ∂loss = ForwardDiff.gradient(loss, z) # this work
    Flux.update!(opt, z, ∂loss)
end

model(z) |> println # the model classifies the modified image as 0 

If I use Flux.gradient the program does not work and after the first call of Flux.gradient(loss)[1] the model is broken

for i in 1:100
    ∂loss =  Flux.gradient(loss, z)[1]  # this does not work and change the model
    Flux.update!(opt, z, ∂loss)
end

# if i test model accuracy I get
accuracy(model(images), labels) |> println # 9.8%
loss(z) |> println # NaN32
model(images) |> println # a matrix full of NaN, the model is broken

Why Flux.gradient change the model?
I don’t expect it.

This should be ∂loss = Flux.gradient(loss, z) (no indexing).

Recall that Flux.gradient returns a “bag” of gradients (Grads struct) when you pass it a bag of model parameters (Params, which is what Flux.params returns). This can only be indexed by using the original parameter arrays because it uses object ids as keys. Writing ∂loss[1] doesn’t make any sense because it’s indexing into the params at some garbage key that may or may not exist. It’s like reading from an chunk of uninitialized memory: you may get something back, but it could be/do anything.

TL;DR Flux.update! already handles everything for you, never index into the result of gradient with some arbitrary value if you use params.

2 Likes

Thanks for the answer, but I did not understand.
In general if I have a function and I call the Flux.gradient on that function I get back a tuple containing the partial derivatives relative to the parameter of the function. Is that correct?
If I remove the index the code is broken.

However, if I change the code in this way (like the code here:Optimisers · Flux):

θ = Flux.params(z)
grads = Flux.gradient(() -> loss(z), θ)

for i in 1:100
    Flux.update!(opt, z, grads[z])
end

the model is broken after

grads=Flux.gradient(() -> loss(z), θ)

Why does this happen?

Ah, I misread part of the original code. Flux.gradient(loss, z) was only returning gradients with respect to the input z, whereas you want them in terms of the parameters θ.

This is incorrect. You’re still taking gradients with respect to the input z instead of the model parameters. All this will do is update z 100 times to some nonsense value while leaving the model parameters unchanged.

If you look further down on that page, there’s a snippet of how to correctly call update! with Params. Adapted slightly:

θ = Flux.params(model) # note: not z, model
grads = Flux.gradient(() -> loss(z), θ)
Flux.update!(opt, θ, grads) # note: not z, θ

Not only does it work, it’s less code to write!

1 Like

Also your code broke the model on my pc. I don’t want to modify the model.

Recap:
I have one image from MNIST and a fully connected model saved on disk that classifies correctly that image.
I want to modify that image so that the model misclassifies it as a 0.
I compute the gradient of the loss respect to the image and then update the image itself.

I would like to do this with Flux

index_image = 10
z = images[:, index_image:index_image] # a 4 classified as 4 by the model
zero_label = reshape([1, 0, 0, 0, 0, 0, 0, 0, 0, 0], (10, 1))

loss(x) = Flux.mse(model(x), zero_label)

for i in 1:100
    ∂loss = ForwardDiff.gradient(loss, z) # compute the gradient of loss respect to the image "z"
    Flux.update!(opt, z, ∂loss)                 # update the image itself until the model classifies it as 0
end

model(z) |> println # now classified as 0 despite being a modified 4

How can i do it?

I don’t speak English and I don’t know if my English is understandable, sorry for the mistakes.

Your English is fine, but you’re going to have to provide some more information to properly troubleshoot this :slight_smile: . We can’t do anything with just “your code broke the model on my pc”.

See PSA: make it easier to help you. Specifically, you’ll need to provide:

  1. The error and stacktrace you get (i.e. what julia outputs when the model is “broken”)
  2. The version of Flux you’re using
  3. The structure of your model. This should be as easy as copying the output of println(model). You probably won’t have to change it, but it’s hard to debug without actually seeing what it is.
2 Likes

I don’t get an error from julia. If I run the program, it will terminate without any error.

I noticed that with ForwardDiff.gradient the image (in this case a 4) is modified correctly until it is classified as 0.

for i in 1:100
    ∂loss = ForwardDiff.gradient(loss, z)
    Flux.update!(opt, z, ∂loss)
end

model(z) |> println # now the model classifies the modified image as a 0

If I put a loss(z) |> println in the for loop I see that the loss decrease

If I try to do the same thing with Flux.gradient I get NaN32 from the loss (after the first evaluation of the gradient).

for i in 1:100
    ∂loss = Flux.gradient(loss, z)[1]  # the gradient of the loss respect to the first parameter (the image z)
    loss(z) |> println # NaN32 after first evaluation of ∂loss
    Flux.update!(opt, z, ∂loss)
end

If i put at the end of the code this

model(images) |> println  # matrix full of NaN32
accuracy(model(images), labels) |> println # 9.8%, before was 96.7%
loss(z) |> println  # NaN32

For same reason the evaluation of the gradient of the loss respect to the image z changes the NN.

I’m using julia 1.5.3.
The model saved on the disk is a very simple pretrained fully connected NN:
Chain(Dense(784, 200),
BatchNorm(200, λ = relu),
Dropout(0.3),
Dense(200, 10),
softmax)

I’ve experienced a similar issue (while computing gradient with respect to input), and in my case I think it was due to this bug: calling softmax (on a CuArray) updates its argument in place.

I’m not sure it’s exactly the same problem, but maybe you’re hitting a similar problem. Could you try and see if things work as expected on the CPU?

I have not a nvidia gpu.
All is running on the cpu (ryzen 5 3500u).

edit: I have tried another model without softmax, i got the same problem
the new model is: Chain(Dense(784, 100), BatchNorm(100, λ = relu), Dropout(0.3), Dense(100, 10), my_activation)

where my_activation is:

function my_activation(x)
    k = abs.(x)
    k./sum(k, dims=1)
end

after the first evaluation of the gradient respect to the image z the model is broken and the loss(z) is NaN32

If I put this instruction after loading the model (Model Reference · Flux):

Flux.testmode!(model)

this finally works:

for i in 1:100
    ∂loss = Flux.gradient(loss, z)[1]
    Flux.update!(opt, z, ∂loss)                 
end
1 Like

This is why it’s important to show the model!

By default, BatchNorm will update its internal statistics when called in the context of gradient. As you noted, Flux.testmode! fixes this because it will freeze all normalization layers and prevent this updating even when gradients are being taken.