Keno recently did a talk on the new compiler-based AD that should come out soon, Diffractor.jl, and it focused on efficient higher order AD. Why? Because we know that Zygote has major performance problems with nesting. Essentially what happens is that two codegen calls together before doing optimization seems to generate large enough code that optimization heuristics can fail. In other words, Zygote is just slow in the case you’re looking at here, it’s sad, and the solution is described in the video below but won’t be ready right now.
If this is for physics-informed neural networks, then note from the video that PINNs are precisely what caused this line of R&D, and the optics formulation is a much better answer than what we had before. FWIW doing mixed mode and making the PDE derivatives be forward mode while the loss is reverse is an asymptotically good strategy anyways, so that’s the workaround I’d recommend for now.