I’m having a hard time figuring out why this works:
using Zygote
W, b = rand(2, 3), rand(2)
predict(x) = W*x .+ b
g = gradient(() -> sum(predict([1,2,3])), Params([W, b]))
g[W], g[b]
but this doesn’t:
using Zygote
a = 2
x = 2
f(x) = x^a
gp = gradient(() -> f(x), Params(a))
gp[a]
I get the error:
ERROR: Only reference types can be differentiated with `Params`.
Can anyone help?
I think I figured it out. Everything has to be an array except the output of the function. So this now works:
using Zygote
a = [2]
x = [2]
f(x) = x.^a[1]
gp = gradient(() -> sum(f(x)), Params([a]))
gp[a]
Your second code works, but using arrays is quite a drag on performance:
# Integer version
julia> @btime $(Ref(2))[]^$(Ref(2))[]
3.369 ns (0 allocations: 0 bytes)
4
# Array version
julia> @btime [2].^[2][1];
82.958 ns (3 allocations: 288 bytes)
The easiest way to get what you want is obviously gradient(a->2^a, 2)
, but I am assuming there are other considerations which lead you to the approach you proposed above. We might be able to help further if you share more details.
Thanks for offering. This is what I am trying to accomplish:
I have a function that I would normally write as
F(L1, L2; a1=1,a2=1) = a1*(L1^a2 + L2^a2)^(1/a2)
a1
and a2
are parameters. I’m trying to get the derivatives with respect to a1
and a2
. I could write it using arrays:
F(L; a=ones(1,2)) = a[1]*(L[1]^a[2] + L[2]^a[2])^(1/a[2])
I was trying to find out why the following does not output an answer:
grads = gradient(() -> F(L), Params([a]))
grads[a]
Edit: Let me provide a little more context. Consider L
as “data”, and a
as parameters to be estimated later. The derivatives with respect to L
have theoretical importance. The derivatives with respect to a
are to be used in an optimization procedure later.
gradient((a1,a2)->F(L1,L2; a1=a1,a2=a2), a1,a2)
should do the trick, no?
That works. Then I’m unsure when it is necessary to employ Params
. Could you explain a little about that?
I’m no expert, but I would say the main occasion when Params
comes in handy is if the function to differentiate consists of many nested, parametrised functions (e.g. a deep neural network). In this case, it can be annoying to explicitly pass the parameters through the callstack, and Params
provides a means to avoid that. However, note that the Zygote documentation lists at least two other ways to achieve this and actually recommend against using Params
:
However, implicit parameters exist mainly for compatibility with Flux’s current AD; it’s recommended to use the other approaches unless you need this.