Zygote dozens* of times slower than manually written function

I think for many physical functions it’s best to derive them by hand, if possible. Then optimize them for the computer (like DNF did, avoid divisions, repeated calculations and non-integer exponents) and write a ChainRules rrule. That way Zygote will be able to work with it and it will be relatively fast.

I’m not sure the following is 100% correct. Improves from 4.3us to 735ns (gradient(K, $x) gives 920ns).

# define before first call to gradient
function ChainRulesCore.rrule(::typeof(G), x) 
    pullback(Δy) = (NoTangent(), ∇G(x) * Δy)
    return G(x), pullback
end

Maybe someone more familiar with Zygote + ChainRules can optimize this even more.

Oh and while marius311’s answer above involving forwarddiff is super fast, it changes the definition of H. If that is not an option and you still want to use forwarddiff explicitly, you can use the freshly announced ForwardDiffPullbacks package:

using ForwardDiffPullbacks
gradient(fwddiff(G), x)  # 133ns vs 730ns on my machine, 0 allocs