Why Optim.jl does not allow for backwards mode autodifferentiation?

Devetak · June 14, 2023, 4:30pm

I apologise if this is a silly question.

Why Optim.jl does not allow for backwards mode autodifferentiation? If I understand things correctly backwards mode is better for the case when the output dimension is smaller than the input dimension. This is the case in optimization problems. In fact, if I understand correctly optimization problems are the best case of this as output is just a scalar.

Is there something I am missing/getting wrong?

Thank you in advance!

ChrisRackauckas · June 14, 2023, 4:40pm

It does. See:

Some (but not all) of the choices are reverse mode AD.

Indeed. For large optimization problems, the reverse-mode methods are the most efficient. If you check the recommendations I linked above, it only recommends forward mode for small problems. This is because reverse mode can have more overhead and thus for a small enough optimization problem the gradients via forward-mode can still be faster. That cutoff point is problem-dependent and always changing but ~100 is around the point where you want to have definitely made the swap.

Devetak · June 15, 2023, 7:10am

Maybe there was some confusion. I meant Optim.jl, not Optimization.jl. See here: Optim.jl.

But thank you. The package you linked solves my problems.

simsurace · June 15, 2023, 7:52am

If you want to use reverse mode with Optim directly, you need to pass the gradient function yourself as the second argument to Optim.optimize.
IIUC, Optimization.jl is a front-end that does this automatically.

ElOceanografo · June 15, 2023, 8:47am

A minimum working example of how to do that (it’s easy!):

using Optim, ReverseDiff

f(x) = sum(abs2, x) # objective function
g!(G, x) = ReverseDiff.gradient!(G, f, x)

x0 = randn(1000)
optimize(f, g!, x0)

Devetak · June 15, 2023, 11:12am

Yes of course! I was just confused why forward mode is supported natively and for backwards you need to provide it yourself.

johnmyleswhite · June 15, 2023, 12:27pm

Historical reasons – forward mode was correct and stable long before backward mode was.

Topic		Replies	Views
Picking an AD Backend and Enzyme Errors Optimization (Mathematical) optimization , enzyme	8	336	March 3, 2024
Optim finite differences use a vectorized version New to Julia	3	111	April 1, 2025
Automatic Differentiation Machine Learning	11	3291	February 11, 2019
Mixed-mode automatic differentiation using ForwardDiff and ReverseDiff General Usage forwarddiff , reversediff , autodiff	9	2758	February 1, 2022
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2723	November 28, 2020

Why Optim.jl does not allow for backwards mode autodifferentiation?

Related topics