Help with Zygote and parameters

amrods · July 1, 2020, 1:22am

I’m having a hard time figuring out why this works:

using Zygote
W, b = rand(2, 3), rand(2)
predict(x) = W*x .+ b
g = gradient(() -> sum(predict([1,2,3])), Params([W, b]))
g[W], g[b]

but this doesn’t:

using Zygote
a = 2
x = 2
f(x) = x^a
gp = gradient(() -> f(x), Params(a))
gp[a]

I get the error:

ERROR: Only reference types can be differentiated with `Params`.

Can anyone help?

amrods · July 1, 2020, 2:18am

I think I figured it out. Everything has to be an array except the output of the function. So this now works:

using Zygote
a = [2]
x = [2]
f(x) = x.^a[1]
gp = gradient(() -> sum(f(x)), Params([a]))
gp[a]

ettersi · July 1, 2020, 3:08am

Your second code works, but using arrays is quite a drag on performance:

# Integer version
julia> @btime $(Ref(2))[]^$(Ref(2))[]
  3.369 ns (0 allocations: 0 bytes)
4

# Array version
julia> @btime [2].^[2][1];
  82.958 ns (3 allocations: 288 bytes)

The easiest way to get what you want is obviously gradient(a->2^a, 2), but I am assuming there are other considerations which lead you to the approach you proposed above. We might be able to help further if you share more details.

amrods · July 1, 2020, 3:26am

Thanks for offering. This is what I am trying to accomplish:
I have a function that I would normally write as

F(L1, L2; a1=1,a2=1) = a1*(L1^a2 + L2^a2)^(1/a2)

a1 and a2 are parameters. I’m trying to get the derivatives with respect to a1 and a2. I could write it using arrays:

F(L; a=ones(1,2)) = a[1]*(L[1]^a[2] + L[2]^a[2])^(1/a[2])

I was trying to find out why the following does not output an answer:

grads = gradient(() -> F(L), Params([a]))
grads[a]

Edit: Let me provide a little more context. Consider L as “data”, and a as parameters to be estimated later. The derivatives with respect to L have theoretical importance. The derivatives with respect to a are to be used in an optimization procedure later.

ettersi · July 1, 2020, 8:42am

gradient((a1,a2)->F(L1,L2; a1=a1,a2=a2), a1,a2)

should do the trick, no?

amrods · July 1, 2020, 9:05am

That works. Then I’m unsure when it is necessary to employ Params. Could you explain a little about that?

ettersi · July 1, 2020, 9:39am

I’m no expert, but I would say the main occasion when Params comes in handy is if the function to differentiate consists of many nested, parametrised functions (e.g. a deep neural network). In this case, it can be annoying to explicitly pass the parameters through the callstack, and Params provides a means to avoid that. However, note that the Zygote documentation lists at least two other ways to achieve this and actually recommend against using Params:

However, implicit parameters exist mainly for compatibility with Flux’s current AD; it’s recommended to use the other approaches unless you need this.

Topic		Replies	Views
Allocations in Zygote for a simple derivative Machine Learning performance , zygote	0	679	February 19, 2020
Zygote - parametrize matrix such that gradient is only performed on selected coefficients Machine Learning question , zygote , matrices , natural-gradient , autodiff	1	541	June 5, 2021
Need some help in understanding zygote gradient Machine Learning	2	407	September 7, 2022
Zygote dropgrad for all function numerical arguments that are non-differentiable (e.g., Int) General Usage zygote	1	499	June 25, 2021
Compute gradient of gradient norm using zygote New to Julia zygote	17	1967	August 26, 2022

Help with Zygote and parameters

Related topics