Why Optim.jl does not allow for backwards mode autodifferentiation?

I apologise if this is a silly question.

Why Optim.jl does not allow for backwards mode autodifferentiation? If I understand things correctly backwards mode is better for the case when the output dimension is smaller than the input dimension. This is the case in optimization problems. In fact, if I understand correctly optimization problems are the best case of this as output is just a scalar.

Is there something I am missing/getting wrong?

Thank you in advance!

It does. See:

Some (but not all) of the choices are reverse mode AD.

Indeed. For large optimization problems, the reverse-mode methods are the most efficient. If you check the recommendations I linked above, it only recommends forward mode for small problems. This is because reverse mode can have more overhead and thus for a small enough optimization problem the gradients via forward-mode can still be faster. That cutoff point is problem-dependent and always changing but ~100 is around the point where you want to have definitely made the swap.


Maybe there was some confusion. I meant Optim.jl, not Optimization.jl. See here: Optim.jl.

But thank you. The package you linked solves my problems.

If you want to use reverse mode with Optim directly, you need to pass the gradient function yourself as the second argument to Optim.optimize.
IIUC, Optimization.jl is a front-end that does this automatically.


A minimum working example of how to do that (it’s easy!):

using Optim, ReverseDiff

f(x) = sum(abs2, x) # objective function
g!(G, x) = ReverseDiff.gradient!(G, f, x)

x0 = randn(1000)
optimize(f, g!, x0)

Yes of course! I was just confused why forward mode is supported natively and for backwards you need to provide it yourself.

Historical reasons – forward mode was correct and stable long before backward mode was.