Unrecognized gradient using Zygote for AD with Universal Differential Equations

mcabbott · October 13, 2021, 3:52am

If you boil this down a little, your two loss functions are doing something like this (you could delete the sqrt(3^x) term here too):

julia> using Zygote

julia> withgradient(x -> sqrt(0^x) + sqrt(3^x), 4)
(val = 9.0, grad = (NaN,))

julia> withgradient(x -> 0^(x/2) + 3^(x/2), 4)
(val = 9.0, grad = (4.943755299006494,))

julia> let x = 4.001
         sqrt(0^x) + sqrt(3^x)
       end
9.004945113365242  # supports the 2nd answer

The reason you get NaN is that the slope of sqrt at zero is infinite. That infinity multiplies the slope of 0^x at 4, which is zero. Whereas with the 0^(x/2) version, the slope is simply zero.

For an AD system to do better, I suppose it would need to keep track of how big an infinity the gradient of sqrt is… has anyone made such a thing?

julia> ForwardDiff.derivative(x -> sqrt(0^x) + sqrt(3^x), 4)
NaN

julia> ForwardDiff.derivative(x -> 0^(x/2) + 3^(x/2), 4)
4.943755299006494

julia> gradient(sqrt, 0)  # Zygote
(Inf,)

julia> ForwardDiff.derivative(sqrt, 0)  # often used in Zygote's broadcasting
Inf

julia> gradient(x -> 0^x, 4)
(0.0,)

julia> ForwardDiff.derivative(x -> 0^x, 4)
0

Topic		Replies	Views
Compute gradients in neuralODE with Zygote Machine Learning	3	253	August 24, 2023
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1321	January 13, 2021
Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021) Machine Learning question , flux , zygote	1	971	November 23, 2021
Problems with reverse mode automatic diff (e.g. zygote) with NeuralSDEs Modelling & Simulations sde , zygote	2	149	June 15, 2024
Workaround to "Mutating arrays is not supported" with Zygote and UDEs New to Julia flux , pde , zygote	7	1655	July 2, 2021

Unrecognized gradient using Zygote for AD with Universal Differential Equations

Related topics