Confused about how Zygote pullback handles tuple-valued loss functions

bgroenks · December 20, 2022, 2:48pm

I noticed something odd (or at least that I don’t understand) with the Zygote.pullback method.

Let’s consider a very simple test case:

using Zygote

losses, back = Zygote.pullback(x -> (sum(x), sum(x)), [1.0,1.0])

Here we have a tuple-valued loss function, for which I would expect back to handle each entry separately, i.e. B_1 = \frac{\partial y_1}{\partial \ell_1}\frac{\partial \ell_1}{\partial p}, B_2 = \frac{\partial y_2}{\partial \ell_2}\frac{\partial \ell_2}{\partial p}.

However, that isn’t what happens. Instead, it seems like back treats the loss tuple as a vector and then sums over the results.

grads = back((1.0,1.0))
# output
(Fill(2.0, 2),)

and where does the Fill come from?

Am I missing something? Or is this a bug?

Thanks!

Tomas_Pevny · December 20, 2022, 2:54pm

Zygote can handle only scalar loss function. Period. If you would like to take gradient of a function with two outputs, you are effectively computing Jacobian, which means that you need to take the gradient with respect to the first item, and then with the second.

bgroenks · December 20, 2022, 3:05pm

I guess I was just thinking of it as taking the gradient with respect to two separate scalar losses (e.g. in a case where you have multiple additive loss terms), but yes, you’re right, it could also be a Jacobian. So then I guess Zygote.jacobian would be the appropriate function to use here.

Topic		Replies	Views
Zygote: how to get intermediate values without evaluating function twice? General Usage differentiation , zygote	4	739	October 11, 2021
Compute gradients in neuralODE with Zygote Machine Learning	3	255	August 24, 2023
Zygote pullback vs gradient New to Julia flux , zygote	2	2166	December 25, 2020
Get intertermediate results from gradient computation General Usage flux , zygote	0	312	October 12, 2020
Any Tracker.forward in Zygote.jl? General Usage	5	817	November 15, 2019

Confused about how Zygote pullback handles tuple-valued loss functions

Related topics