# Why calculating gradients from Params is different than doing it directly?

I’ve encoutered a case when calculating gradients with two methods yield different results.

Here is an excerpt from my code, excluding the actual loss function.

``````
julia> x, y=get_batch(4)
([("1c944ea1a3ae18a8df9f411d3c66827c8f26a0c76835deefe5c27acaf37f510d", "092fe98b5be67c2a426f8dc8b6242f339f4701099ec6cf44257b2828804d5854"), ("69ec244c61b4edc0118f6c5dc115dc0911729774d9848e81f4e903b824321d29", "2d19a989f1097ffe35458c49a4c48d3257148e84860961cec4482cdde87681a9"), ("7c79e8c4f90532b8754b7cb6499a070fa70aad5211266043847a0ec593061753", "bbf8173740f461e201356d30d1fe0b0a2fb07ff67c829124bc2702dd189bd855"), ("2b4971006d4b0b546b923503e5b7962e9f7b2531f9580ce9062b38b034b10c80", "5afaddd9c7c2671bdf549dcda7123921da83e0c0e68fbfd7f0cd6c7402a2c7f6")], [2, 2, 1, 1])

julia> θ=Params([weight_pars, sigma_pars])  # Our model
Params([[0.11504418023074425, 0.28379663653151455], [10.88, 54.4]])

julia> loss(x, y, weight_pars, sigma_pars) # the loss
0.4343795875304357

julia> gs = gradient(() -> loss(x, y, weight_pars, sigma_pars), θ)

julia> gs[weight_pars],gs[sigma_pars] # One version of gradients
([-0.1021944543969813, 0.4054565903538764], [0.028360913853822473, 0.020073262552095503])

julia> gradient((w, s) -> loss(x, y, w, s), weight_pars, sigma_pars) # The other version of gradients
([-0.04534501818695343, 0.2169181270035139], [0.014180456926911236, 0.010036631276047751])
``````

The gradients w.r.t. `sigma_pars` differ by factor 2. Gradients w.r.t. `weight_pats` not so much.

Shouldn’t get the same value for gradients irrespective on the syntax? Why I get different values?

I could post my loss function if that would help.

I use `[e88e6eb3] Zygote v0.4.17` and `[587475ba] Flux v0.10.4` on Julia 1.4.1

The most likely answer is that the loss function is “double dipping”, accessing the parameters both from the function argument inputs to `loss` and directly from global scope. `Params` will take account of the parameters wherever they come from, whereas standard `gradient` will only consider the function argument version of `weight_pars` and `sigma_pars` as contributing to the gradient, and use of those arrays from global scope will be ignored.

1 Like

Can you please elaborate a bit more? Which version do you think is correct?

If I define a fake loss function like

``````loss(x, y, weight_pars, sigma_pars)=log(sum(weight_pars.*weight_pars) + sum(sigma_pars.*sigma_pars))
``````

I get both ways of defining gradients agree.

For example, I think you’re doing something like

``````x = 2
f(y) = x*y
``````

“Correct” is a definitional issue here, since we have two different ways to think about the function input. If we tweak `x` itself to `x = 2 + ϵ` we’ll get a gradient of `2x`. If we tweak the unnamed second input to the function f we get `f(x+ϵ) = x(x+ϵ)` and the gradient is `2`. `Params` asks for the former and plain `gradient` asks for the latter. `Params` is probably correct in the sense that it’s what you intended here.

This obviously only comes up because `x` is both a global variable and a function argument. So the easiest fix, assuming this is the issue, is to make sure your code consistently uses `weight_pars` and co either via global scope or via an explicit function argument, but not both ways.

2 Likes

After a lot of poking around my code I think I found the problem. My loss function uses

``````function loss(x, y, pars)
# ...
return sum([el_prediction(el, param) for el in x])
end
``````

``````function loss(x, y, pars)
# ...
mysum::Float64 = 0
for el in x
mysum += el_prediction(el, param)
end
return mysum
end
``````

I wish such behaviour was documented, or at least threw an error…

If it’s not what I suggested, and you can fix this just by changing a `sum` generator to a loop, you should open an issue – that clearly shouldn’t make a difference and suggests a bug in Zygote’s adjoints somewhere (probably not passing `Params` through when differentiating a generator, or something similar).