I’ve encoutered a case when calculating gradients with two methods yield different results.
Here is an excerpt from my code, excluding the actual loss function.
julia> x, y=get_batch(4)
([("1c944ea1a3ae18a8df9f411d3c66827c8f26a0c76835deefe5c27acaf37f510d", "092fe98b5be67c2a426f8dc8b6242f339f4701099ec6cf44257b2828804d5854"), ("69ec244c61b4edc0118f6c5dc115dc0911729774d9848e81f4e903b824321d29", "2d19a989f1097ffe35458c49a4c48d3257148e84860961cec4482cdde87681a9"), ("7c79e8c4f90532b8754b7cb6499a070fa70aad5211266043847a0ec593061753", "bbf8173740f461e201356d30d1fe0b0a2fb07ff67c829124bc2702dd189bd855"), ("2b4971006d4b0b546b923503e5b7962e9f7b2531f9580ce9062b38b034b10c80", "5afaddd9c7c2671bdf549dcda7123921da83e0c0e68fbfd7f0cd6c7402a2c7f6")], [2, 2, 1, 1])
julia> θ=Params([weight_pars, sigma_pars]) # Our model
Params([[0.11504418023074425, 0.28379663653151455], [10.88, 54.4]])
julia> loss(x, y, weight_pars, sigma_pars) # the loss
0.4343795875304357
julia> gs = gradient(() -> loss(x, y, weight_pars, sigma_pars), θ)
Grads(...)
julia> gs[weight_pars],gs[sigma_pars] # One version of gradients
([-0.1021944543969813, 0.4054565903538764], [0.028360913853822473, 0.020073262552095503])
julia> gradient((w, s) -> loss(x, y, w, s), weight_pars, sigma_pars) # The other version of gradients
([-0.04534501818695343, 0.2169181270035139], [0.014180456926911236, 0.010036631276047751])
The gradients w.r.t. sigma_pars
differ by factor 2. Gradients w.r.t. weight_pats
not so much.
Shouldn’t get the same value for gradients irrespective on the syntax? Why I get different values?
I could post my loss function if that would help.
I use [e88e6eb3] Zygote v0.4.17
and [587475ba] Flux v0.10.4
on Julia 1.4.1