Picking an AD Backend and Enzyme Errors

I’m working on an optimization problem for a paper of the form
cost(θ) => scalar, where θ is potentially large, maybe 100-200 dimensional.

I was under the assumption reverse mode AD is appropriate for this task, however benchmarking a few:

For θ in R^100

For forward mode:

  • ForwardDiff takes ~200ms
  • PolyesterForwardDiff takes ~20ms (with 10 Chunks)

For reverse mode:

  • ReverseDiff takes ~2000ms with significantly more allocations
  • Zygote fails because I do quite a lot of mutation to avoid allocations
  • Diffractor gives some type error that I don’t understand
  • Enzyme gives this, which…I’m really not sure what to do with

For completeness, I opened an Enzyme issue here.

So, what is the best approach here? ForwardDiff is working, PolyesterForwardDiff gives a nice 10x speedup, but seem unsupported as an AD backend for SciML’s Optimization. And only ReverseDiff works in reverse-land.

I’d appreciate any insight!

For more context, the problem setup is here

Ah, also my mistake, AutoPolyesterForwardDiff is indeed apart of Optimization. It just hasn’t made it to the docs yet.

EDIT: However, I could not get it to work.

Can you post a reproducible example of your code?

Replied on GitHub re Enzyme error.

Looks like Julia’s complex sqrt has a bit hack. We should just define a rule for it. That should fix it, but I don’t myself have time for roughly a week before I can work on it.

If you or others are interested in adding it lmk.

Doing so properly requires writing the complex sqrt here (Enzyme/enzyme/Enzyme/InstructionDerivatives.td at 0b621884bc531329095d202f042f6599a86614ec · EnzymeAD/Enzyme · GitHub this is the complex 1/z rule) and for Julia here (Enzyme.jl/src/compiler/interpreter.jl at 8784d1f79cf9e84028bc04c7455493d1b9dcbd31 · EnzymeAD/Enzyme.jl · GitHub) and here (Enzyme.jl/src/compiler.jl at 8784d1f79cf9e84028bc04c7455493d1b9dcbd31 · EnzymeAD/Enzyme.jl · GitHub)

1 Like

For performant reversediff you would want to compile the tape API · ReverseDiff.jl

2 Likes

For completeness, at the moment Diffractor.jl is only a forward mode backend, and still rather experimental

1 Like

Is there a reason you think reverse mode is more appropriate for this task? We have done many detailed measurements in different domains, and for example A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions | IEEE Conference Publication | IEEE Xplore this mentions that due to various limitations of reverse mode that 100 is the cutoff to where you should start considering reverse-mode AD.

  • So anything below 100, definitely forward.
  • Way above 100, definitely reverse
  • Around 100? Murky waters.

And that’s without considering PolyesterForwardDiff, which using parallelism on a high core computer definitely pushes that range up a bit. Though newer versions of Enzyme definitely push that range back down a bit, so 100 is still about where I would expect it to be.

All I mean to say from this is, our benchmarks continually tell us that no, I don’t know why you would expect miracles from reverse mode here. PolyesterForwardDiff is likely to be the fastest, I would not expect even compiled tape ReverseDiff to be faster, Enzyme is the only thing that has a chance but that’s dependent on many factors (and it’s battling uphill without a multithreaded form), and you’re likely to not see a real gain after doing a bunch of work to get reverse mode to work better. I would set the expectations there and consider this an academic exercise to reverify it. If you were talking about 1000 I’d say differently, but 100 is about where the cutoff is and the multithreaded versions really make that level not “big”.

1 Like

Thanks for your input, Chris! What you say makes total sense.

Is there a reason you think reverse mode is more appropriate for this task?

Well, that’s why I benchmarked it :slight_smile:.

And that’s without considering PolyesterForwardDiff, which using parallelism on a high core computer definitely pushes that range up a bit. Though newer versions of Enzyme definitely push that range back down a bit, so 100 is still about where I would expect it to be.

I’m excited to see how Enzyme will perfom once we fix that bug from above w.r.t complex sqrt, but otherwise I’m quite satisfied with the performance of PolyesterForwardDiff, however I haven’t quite gotten it to work in Optimization. Nothing in the docs mention PolyesterForwardDiff (OptimizationFunction · Optimization.jl), although AutoPolyesterForwardDiff seems to exist.

I’m not sure if this warrants opening an issue but as an example:

using Optimization, OptimizationOptimJL, PolyesterForwardDiff
rosenbrock(u, p) = (p[1] - u[1])^2 + p[2] * (u[2] - u[1]^2)^2
u0 = zeros(2)
p = [1.0, 100.0]
optf = OptimizationFunction(rosenbrock, Optimization.AutoPolyesterForwardDiff())
prob = OptimizationProblem(optf, u0, p, lb = [-1.0, -1.0], ub = [1.0, 1.0])

gives

ArgumentError: The passed automatic differentiation backend choice is not available. Please load the corresponding AD package PolyesterForwardDiff.

even though it is clearly loaded.

1 Like