I’m having a hard time figuring out why this works:
W, b = rand(2, 3), rand(2)
predict(x) = W*x .+ b
g = gradient(() -> sum(predict([1,2,3])), Params([W, b]))
but this doesn’t:
a = 2
x = 2
f(x) = x^a
gp = gradient(() -> f(x), Params(a))
I get the error:
ERROR: Only reference types can be differentiated with `Params`.
Can anyone help?
I think I figured it out. Everything has to be an array except the output of the function. So this now works:
a = 
x = 
f(x) = x.^a
gp = gradient(() -> sum(f(x)), Params([a]))
Your second code works, but using arrays is quite a drag on performance:
# Integer version
julia> @btime $(Ref(2))^$(Ref(2))
3.369 ns (0 allocations: 0 bytes)
# Array version
julia> @btime .^;
82.958 ns (3 allocations: 288 bytes)
The easiest way to get what you want is obviously
gradient(a->2^a, 2), but I am assuming there are other considerations which lead you to the approach you proposed above. We might be able to help further if you share more details.
Thanks for offering. This is what I am trying to accomplish:
I have a function that I would normally write as
F(L1, L2; a1=1,a2=1) = a1*(L1^a2 + L2^a2)^(1/a2)
a2 are parameters. I’m trying to get the derivatives with respect to
a2. I could write it using arrays:
F(L; a=ones(1,2)) = a*(L^a + L^a)^(1/a)
I was trying to find out why the following does not output an answer:
grads = gradient(() -> F(L), Params([a]))
Edit: Let me provide a little more context. Consider
L as “data”, and
a as parameters to be estimated later. The derivatives with respect to
L have theoretical importance. The derivatives with respect to
a are to be used in an optimization procedure later.
gradient((a1,a2)->F(L1,L2; a1=a1,a2=a2), a1,a2)
should do the trick, no?
That works. Then I’m unsure when it is necessary to employ
Params. Could you explain a little about that?
I’m no expert, but I would say the main occasion when
Params comes in handy is if the function to differentiate consists of many nested, parametrised functions (e.g. a deep neural network). In this case, it can be annoying to explicitly pass the parameters through the callstack, and
Params provides a means to avoid that. However, note that the Zygote documentation lists at least two other ways to achieve this and actually recommend against using
However, implicit parameters exist mainly for compatibility with Flux’s current AD; it’s recommended to use the other approaches unless you need this.