Odd warning and issues with optimization only when inside Optimization.jl. Zygote.hessian works fine

gdalle · October 30, 2024, 5:18am

Hareruya:

However, whenever I perform optimization I get this warning that I can’t seem to find the specific documentation as to what this means:
Warning: The selected optimization algorithm requires second order derivatives, but `AutoZygote` ADtype was provided. So a `SecondOrder` with `AutoZygote` for inner and `AutoForwardDiff` for outer will be created, for choosing another pair an explicit `SecondOrder` ADType is recommended.
I looked over the Optimization.jl page and didn’t quite understand this. “SecondOrder” returns no results in the page either.

This warning appears because Optimization.jl uses my package DifferentiationInterface.jl (DI) under the hood to perform AD. When you only provide a first-order backend like AutoZygote(), Optimization.jl makes an informed decision on what to use for second-order AD. It requires two nested AD calls, and these can be performed with two different backends combined into a struct called DifferentiationInterface.SecondOrder. While you could make both of these calls with the same backend, it is usually a better idea to combine a forward and a reverse mode backend. Thus, to get good performance by default, Optimization.jl computes hessians with SecondOrder(AutoForwardDiff(), AutoZygote()) even though you didn’t ask for it explicitly.^[1]

Hareruya:

This would not be much of a problem if I could consistently optimize for all cases, but another issue is that sometimes the code will not even resolve because I will get this error:
Error: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{typeof(loss)}, Float64}
With the stacktrace being particularly unhelpful due to the line of code giving issue in the latter case does not affect the former.

This is more worrying, and it could be linked to the following DI issue:

github.com/JuliaDiff/DifferentiationInterface.jl

Error in ForwardDiff tagging

opened 07:15PM - 18 Oct 24 UTC

aml5600

backend

I have an NLP that I am solving through `Optimization.jl` using either `Ipopt` o…r `MadNLP` as the solver. Both have previously worked. It seems now that I get an error from ForwardDiff about mismatched tags. I noticed that custom tagging was introduced in `0.6.11`. If I pin to `0.6.10`, I do not see the issue. I do not have the full stacktrace on this machine, but can type out the top line where the tagged function and args differ. Also, my underlying functions are a mix of custom transcription over top of MTK models, so the type information is incredibly long. I try to group it into `...`. The error is [here](https://github.com/JuliaDiff/ForwardDiff.jl/blob/ec74fbc32b10bbf60b3c527d8961666310733728/src/config.jl#L34), where ``` FT = DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, Vector{Float64}, SciMLBase.NullParameters}} ``` but the provided function `f::F` is ``` f::DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, SciMLBase.NullParameters}} ``` where we can see the differing `Vector` vs `SubArray` in the Tuples. The tagged input type, and provided to check_tag, are identical, and begin like: ``` ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{...}} ``` I apologize for the limited information here. If the issue is obvious, wonderful! If not, I can try to work on extracting a MWE as this issue pops up in a much larger process. For now I will pin to `0.6.10`. Thanks!

Tagging is a bit tricky for second-order AD, and a minimum working example would help tremendously here. Can you at least share the stack trace and what your Dual overloads look like, even if it’s not a fully runnable code?

Ping @Vaibhavdixit02

Note that DI actually uses forward-over-reverse for Hessians with pure AutoZygote() as well, because that is what Zygote.hessian chooses to do. ↩︎

Topic		Replies	Views
Compute gradient of gradient norm using zygote New to Julia zygote	17	2182	August 26, 2022
Warning secondorder ADtype in optimization.jl New to Julia differentiation , optimization	3	255	November 12, 2024
Hessian of function involving integration of ODE Machine Learning question	2	317	September 29, 2023
Mutation error in Zygote hessian Machine Learning question , package	2	211	June 22, 2023
Optimization.jl, DataInterpolations.jl and Gradients General Usage interpolations , gradient , data-interpolations	4	841	April 21, 2023

Odd warning and issues with optimization only when inside Optimization.jl. Zygote.hessian works fine

Related topics