Unrecognized gradient using Zygote for AD with Universal Differential Equations

mcabbott · October 13, 2021, 2:02pm

Some even simpler examples might be:

julia> f(x) = inv(inv(x));
julia> g(x) = cbrt(x)^3;

julia> all(f(x)==x for x in -10:10)
true
julia> all(g(x)≈x for x in -10:10)
true

julia> gradient(f, 0)
(NaN,)
julia> gradient(g, 0)
(NaN,)

If the chain of functions (or rather, the chain of their derivatives) contains singularities, at intermediate steps, then the final gradient will tend to be NaN. Even if it’s obvious to a human that things ought to cancel.

The individual components all seem correct here. I think these examples could be fixed by returning incorrect gradients near singularities, e.g. replacing the gradient at x==0 with one slightly off of it, like gradient(cbrt, 0 + eps()). But this may have horrible consequences elsewhere, I’m not sure.

Topic		Replies	Views
Zero gradients with Zygote vs correct gradients with ReverseDiff using DiffEqFlux Machine Learning zygote , reversediff , diffeqflux	4	1429	January 24, 2022
How to use gradient of neural network as the loss function? Machine Learning question	13	2795	March 23, 2021
Flux differentiation error Machine Learning zygote	19	1735	November 19, 2020
Flux, CUDA, Zygote : InvalidIRError: compiling kernel getindex_kernel(CUDA.CuKernelContext, CuDeviceArray New to Julia cuda , flux , zygote	4	1022	December 30, 2020
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	578	April 8, 2022

Unrecognized gradient using Zygote for AD with Universal Differential Equations

Related topics