Is there any good way to check gradient calculated by Zygote.jl

Hi there,
I’d like to know is there any good way to check the gradient of custom matrix functions calculated by Zygote.jl.

like this one:

function my_custom_matrix_func(m)
    return sum(m * m')

I know FiniteDifferences.jl, but I am wondering if it can do finite difference on custom matrix function and return the gradient.

Thanks for any reply.

1 Like

It would be nice to have some functions which took care of everything automatically. … Something with the convenience of gradient/params but for FiniteDiff and which does the check automatically.

If this exists, I would like to know

1 Like

Maybe I’m missing something but doesn’t FiniteDifferences already do this? E.g.:

julia> using FiniteDifferences

julia> grad(central_fdm(3,1), my_custom_matrix_func, [1. 2; 3 4])[1]
2×2 Matrix{Float64}:
 8.0  12.0
 8.0  12.0

Not sure if there any official ways to check Zygote gradient, but I usually check it by this:

gs = gradient(...)  # whatever gradient you get

# check parameter `p` and corresponding gradient `g` in gradient
for (p, g) in pairs(gs)
  @info(p)  # print out parameter
  @info(g)  # print out corresponding gradient

That’s not checking the numerical values.

It would be nice to have something that automatically works with the params interface.

Alright, maybe we’re talking about different things, I thought you were trying to check gradients by human eyes. And I just find another useful utility @showgrad, which can help debug gradients.

If you’re just wanting to confirm your numerical gradients are performing correctly, IMO you should always be performing the gradient test. If you have your function f(x) : \mathbb{R}^n \mapsto \mathbb{R} evaluated at a random point x\in\mathbb{R}^n, given the (computed) gradient \nabla f(x) \in \mathbb{R}^n and a random direction \Delta \in \mathbb{R}^n, the following should hold

|f(x+h\Delta) - f(x)| = O(h) \\ |f(x+h\Delta) - f(x) - h \langle \nabla f(x), \Delta \rangle | = O(h^2)

In Julia code, for your matrix case, this would be something along the lines of

n = 1000
x = randn(n, n)
delta = randn(n,n)
f = my_custom_matrix_func
f0 = f(x)
gradf = # computed from Zygote, or wherever
df = dot(gradf, delta)
h = 10 .^ (-6.0:0.0)
err_zeroth_order = zeros(length(h))
err_first_order = zeros(length(h))
for (i,hi) in enumerate(h)
     f1 = f(x+hi*delta)
     err_zeroth_order[i] = abs(f1-f0)
     err_first_order[i] = abs(f1-f0-hi*df)
h0 = median(diff(log10.*(err_zeroth_order))) # Should be ~ 1
h1 = median(diff(log10.*(err_first_order))) # Should be ~2 if your gradient is computed correctly

The O(h^2) behaviour won’t exactly hold for very small h as numerical imprecision errors dominate the convergence error. Lots of edge case considerations for this one but I hope this gets the basic idea across. It’s a good test to add to a test suite!

In 99% of cases you don’t want to implement your own FD code for testing, but use something robust like

with higher order algorithms and stepsize adaptation.