How to deal with Zygote sometimes "pirating" its own adjoints with worse ones?

marius311 · December 22, 2019, 3:02pm

The situation I’m in is that I have some custom arrays and some custom implementation of multiplication, e.g.

function (A::CustomMatrix * b::CustomVector)
    some_specialized_code()
end

Now if I use Zygote to derivative through code which does A*b somewhere, what will happen is that Zygote uses the adjoint defined here, which is:

@adjoint function(A::AbstractMatrix * x::AbstractVector)
  return A * x, Δ::AbstractVector->(Δ * x', A' * Δ)
end

But in this particular case, the resulting adjoint is extremely inefficient as compared to if Zygote had simply taken a gradient through my custom implementation in some_specialized_code().

My question is whether there is a way to keep Zygote from “pirating” its own adjoint like this, or whether there is an easy way to write a custom adjoint that forwards the adjoint to some_specialized_code() instead? Thanks.

ChrisRackauckas · December 22, 2019, 3:23pm

function (A::CustomMatrix * b::CustomVector)
    some_specialized_code(A,b)
end

@adjoint function (A::CustomMatrix * b::CustomVector)
    Zygote.pullback(some_specialized_code,A,b)
end

marius311 · December 22, 2019, 3:35pm

Perfect, thanks!

marius311 · December 24, 2019, 3:14pm

Followup question, but is there any even more automatic way to do it that doesn’t refer to some_specialized_code()? I ask because in some cases, I don’t actually have access to this function as its just a function in Base. This happens e.g. for b' * A. If Zygote derived through Base’s definition which is (A'*b)', things would work fine since A' isa CustomMatrix so it’d fall to the above adjoint definition suggested by Chris. Instead though its using Zygote’s own definition for adjoint of (::AbstractMatrix * ::AbstractMatrix) which is now inefficient.

I guess ideally I need something like:

@adjoint function (b_adj::Adjoint{<:Any, CustomVector} * A::CustomMatrix)
    # teach Zygote that for this call, explicitly derive 
    # through the code that `b_adj * A` actually calls, rather 
    # than using the rule for adjoint of `b_adj * A`
end

Is anything like this possible? Or maybe there’s a better design that side-steps the need for this? Thanks.

Topic		Replies	Views
How do I customize the derivative of a matrix using Zygote: @adjoint General Usage	2	378	August 5, 2022
Zygote @adjoint with matrices Machine Learning	7	1334	December 14, 2019
Zygote adjoint for custom array type General Usage zygote	0	343	April 13, 2021
Zygote custom adjoint has surprising performance effects Machine Learning question	0	455	April 3, 2020
Zygote.jl: @adjoint! (mutating / inplace adjoints) Machine Learning differentiation , zygote , autodiff	1	553	April 2, 2022

How to deal with Zygote sometimes "pirating" its own adjoints with worse ones?

Related topics