While playing around with some neural networks (NN) I found out this
using Flux
using Random
using Zygote
using ForwardDiff
Random.seed!(9120)
n = 1
m = 5
hidden = 10
x, y = rand(m), rand(n) # some data
model = Flux.Chain(Flux.Dense(m, hidden), Flux.Dense(hidden, n))
# Gradient with ForwardDiff
g = z -> ForwardDiff.gradient(w -> model(w)[1], z)
# Getting the weights of the model as an array
ps, re = Flux.destructure(model)
display(g(x)) # Checking with original data
display(g(ps[1:m])) # Checking with weights
# Gradient with Zygote
gs = Zygote.gradient(w -> model(w)[1], rand(m)) # Notice a different random vector
display(gs)
gs = Zygote.gradient(w -> model(w)[1], zeros(m)) # Now with zeros
display(gs)
Now, the result is always the same
5-element Array{Float64,1}:
-0.7097914769304869
-0.14294694831147323
-0.04312831631528913
0.2866390831645096
-0.4046597463981584
5-element Array{Float32,1}:
-0.7097915
-0.14294694
-0.043128345
0.2866391
-0.40465972
(Float32[-0.7097915, -0.14294693, -0.04312831, 0.28663906, -0.40465975],)
(Float32[-0.7097915, -0.14294693, -0.04312831, 0.28663906, -0.40465975],)
Is there a reason why this is the case? I should be inclined to believe that this is because
the input is not being evaluated at all.
Does this mean that the gradient taken is with respect to the weights of the model?