Odd warning and issues with optimization only when inside Optimization.jl. Zygote.hessian works fine

Hareruya · October 30, 2024, 1:08am

Greetings, everyone.

Apologies if the query is too basic, as it does not impede optimization in one case. I have graduated from my PhD program and have gotten a job at a company that offers me freedom in what programming language I am allowed to use in my projects. I elected to use Julia both for my passion for the language and for the thorough integration of the AD methods in the language allowing me to calculate derivatives for general Julia code rather easily*.

In my program, the number of parameters is rather small and their range constrained, so the use of a second order method such as IPNewton with constraints has worked rather well in one case. My code was most extensively tested against Zygote and has definitions for ForwardDiff duals, so there should be no issue. Calling the Zygote.hessian directly does not cause any warnings and is really fast.

However, whenever I perform optimization I get this warning that I can’t seem to find the specific documentation as to what this means:

Warning: The selected optimization algorithm requires second order derivatives, but `AutoZygote` ADtype was provided. So a `SecondOrder` with `AutoZygote` for inner and `AutoForwardDiff` for outer will be created, for choosing another pair an explicit `SecondOrder` ADType is recommended.

I looked over the Optimization.jl page and didn’t quite understand this. “SecondOrder” returns no results in the page either. This would not be much of a problem if I could consistently optimize for all cases, but another issue is that sometimes the code will not even resolve because I will get this error:

Error: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{typeof(loss)}, Float64}

With the stacktrace being particularly unhelpful due to the line of code giving issue in the latter case does not affect the former. Zygote.hessian still calculates the Hessian without issue on both cases, so I am confused as to why the error shows up at all. The code is written very generically, with all parameters being differentiated being assigned ::Number or AbstractArray{<:Number} and the Dual implementations explicitly copy the input tag.

I cannot disclose my model in specifics due to a clause in my contract and can’t seem to isolate this issue to create a mwe, as outside of the optimization loop the hessian calls seem to resolve fine.

gdalle · October 30, 2024, 5:18am

Hareruya:

However, whenever I perform optimization I get this warning that I can’t seem to find the specific documentation as to what this means:
Warning: The selected optimization algorithm requires second order derivatives, but `AutoZygote` ADtype was provided. So a `SecondOrder` with `AutoZygote` for inner and `AutoForwardDiff` for outer will be created, for choosing another pair an explicit `SecondOrder` ADType is recommended.
I looked over the Optimization.jl page and didn’t quite understand this. “SecondOrder” returns no results in the page either.

This warning appears because Optimization.jl uses my package DifferentiationInterface.jl (DI) under the hood to perform AD. When you only provide a first-order backend like AutoZygote(), Optimization.jl makes an informed decision on what to use for second-order AD. It requires two nested AD calls, and these can be performed with two different backends combined into a struct called DifferentiationInterface.SecondOrder. While you could make both of these calls with the same backend, it is usually a better idea to combine a forward and a reverse mode backend. Thus, to get good performance by default, Optimization.jl computes hessians with SecondOrder(AutoForwardDiff(), AutoZygote()) even though you didn’t ask for it explicitly.^[1]

Hareruya:

This would not be much of a problem if I could consistently optimize for all cases, but another issue is that sometimes the code will not even resolve because I will get this error:
Error: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{typeof(loss)}, Float64}
With the stacktrace being particularly unhelpful due to the line of code giving issue in the latter case does not affect the former.

This is more worrying, and it could be linked to the following DI issue:

github.com/JuliaDiff/DifferentiationInterface.jl

Error in ForwardDiff tagging

opened 07:15PM - 18 Oct 24 UTC

aml5600

backend

I have an NLP that I am solving through `Optimization.jl` using either `Ipopt` o…r `MadNLP` as the solver. Both have previously worked. It seems now that I get an error from ForwardDiff about mismatched tags. I noticed that custom tagging was introduced in `0.6.11`. If I pin to `0.6.10`, I do not see the issue. I do not have the full stacktrace on this machine, but can type out the top line where the tagged function and args differ. Also, my underlying functions are a mix of custom transcription over top of MTK models, so the type information is incredibly long. I try to group it into `...`. The error is [here](https://github.com/JuliaDiff/ForwardDiff.jl/blob/ec74fbc32b10bbf60b3c527d8961666310733728/src/config.jl#L34), where ``` FT = DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, Vector{Float64}, SciMLBase.NullParameters}} ``` but the provided function `f::F` is ``` f::DifferentiationInterface.FixTail{OptimizationBase.var"#lagrangian#28"}{...}, Tuple{Float64, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, SciMLBase.NullParameters}} ``` where we can see the differing `Vector` vs `SubArray` in the Tuples. The tagged input type, and provided to check_tag, are identical, and begin like: ``` ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{...}} ``` I apologize for the limited information here. If the issue is obvious, wonderful! If not, I can try to work on extracting a MWE as this issue pops up in a much larger process. For now I will pin to `0.6.10`. Thanks!

Tagging is a bit tricky for second-order AD, and a minimum working example would help tremendously here. Can you at least share the stack trace and what your Dual overloads look like, even if it’s not a fully runnable code?

Ping @Vaibhavdixit02

Note that DI actually uses forward-over-reverse for Hessians with pure AutoZygote() as well, because that is what Zygote.hessian chooses to do. ↩︎

Hareruya · October 30, 2024, 9:09am

Thank you for the assistance. I ran into issues with the invocation of SecondOrder, resulting in this stacktrace:

ERROR: MethodError: kwcall(::@NamedTuple{…}, ::typeof(OptimizationBase.instantiate_function), ::OptimizationFunction{…}, ::OptimizationBase.ReInitCache{…}, ::SecondOrder{…}, ::Int64) is ambiguous.

Candidates:
  kwcall(::NamedTuple, ::typeof(OptimizationBase.instantiate_function), f::OptimizationFunction{true}, cache::OptimizationBase.ReInitCache, adtype::AbstractADType, num_cons)
    @ OptimizationBase ~/.julia/packages/OptimizationBase/ghGuk/src/OptimizationDIExt.jl:267
  kwcall(::NamedTuple, ::typeof(OptimizationBase.instantiate_function), f::OptimizationFunction{true}, x, adtype::Union{AutoZygote, SecondOrder{<:AbstractADType, <:AutoZygote}}, p)
    @ OptimizationZygoteExt ~/.julia/packages/OptimizationBase/ghGuk/ext/OptimizationZygoteExt.jl:21

Possible fix, define
  kwcall(::NamedTuple, ::typeof(OptimizationBase.instantiate_function), ::OptimizationFunction{…}, ::OptimizationBase.ReInitCache, ::Union{…}, ::Any)

Stacktrace:
  [1] OptimizationCache(prob::OptimizationProblem{…}, opt::IPNewton{…}; callback::Function, maxiters::Nothing, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, structural_analysis::Bool, manifold::Nothing, kwargs::@Kwargs{})
    @ OptimizationBase ~/.julia/packages/OptimizationBase/ghGuk/src/cache.jl:60
  [2] __init(prob::OptimizationProblem{…}, opt::IPNewton{…}; callback::Function, maxiters::Nothing, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, kwargs::@Kwargs{})
    @ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/e3bUa/src/OptimizationOptimJL.jl:110
  [3] __init(prob::OptimizationProblem{…}, opt::IPNewton{…})
    @ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/e3bUa/src/OptimizationOptimJL.jl:81
  [4] init(::OptimizationProblem{…}, ::IPNewton{…}; kwargs::@Kwargs{})
    @ SciMLBase ~/.julia/packages/SciMLBase/tey0W/src/solve.jl:172
  [5] init(::OptimizationProblem{…}, ::IPNewton{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/tey0W/src/solve.jl:170
  [6] solve(::OptimizationProblem{…}, ::IPNewton{…}; kwargs::@Kwargs{})
    @ SciMLBase ~/.julia/packages/SciMLBase/tey0W/src/solve.jl:94
  [7] solve(::OptimizationProblem{…}, ::IPNewton{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/tey0W/src/solve.jl:91
  [8] macro expansion
    @ ~/Desktop/main/lib/Main.jl:253 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/ProgressMeter/kVZZH/src/ProgressMeter.jl:1008 [inlined]
 [10] top-level scope
    @ ~/Desktop/main/lib/Main.jl:241
Some type information was truncated. Use `show(err)` to see complete types.

but at least knowing of DifferentiationInterface gives me a place to look up how to debug.

The second issue involves issues on my end as well. I have a series of functions relying on 2 parameters: a real number x and an integer order n. I define their derivatives and the differentiation rules using the derivative function. While the first order using Zygote would then work perfectly, second order derivatives relying on ForwardDiff would fail. I supplemented the definitions with duals just to get the code to run.

An example of what my implementations would look like is this:

function f(x::Real, n::Integer)
-- Code --
end

function df(x::Real, n::Integer)
-- Derivative code --
end

function f(xdx::Dual{Tag}, ndn::Dual) where {Tag}
    x = xdx.value
    dx = xdx.partials
    n = ndn.value
    return Dual{Tag}(f(x), df(x)*dz)
end

I think I am definitely doing something wrong with the dual on the order especially, but the Hessian calls kept failing saying there was no implementation for f(::Dual, ::Dual).

The generated stack trace is:

ERROR: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{DifferentiationInterfaceForwardDiffExt.ForwardDiffOverSomethingHVPWrapper{var"#loss#13"{Matrix{Float64}, ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(α = ViewAxis(1:1, ShapedAxis(())), β = ViewAxis(2:2, ShapedAxis(())))}}}, Vector{ComplexF64}}}, Float64}
Stacktrace:
  [1] ≺(a::Type, b::Type)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:54
  [2] *
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:140 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/partials.jl:157 [inlined]
  [4] scale_tuple
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/partials.jl:200 [inlined]
  [5] *
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/partials.jl:111 [inlined]
  [6] *
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/partials.jl:84 [inlined]
  [7] jn(xdx::ForwardDiff.Dual{Nothing, ForwardDiff.Dual{…}, 2}, ndn::ForwardDiff.Dual{Nothing, Int64, 2})
    @ Main.Library ~/Desktop/main/lib/Implementations.jl:192
  [8] (::Zygote.var"#1410#1411"{typeof(jn)})(::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 3}, ::Int64)
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:276
  [9] _broadcast_getindex_evalf
    @ ./broadcast.jl:673 [inlined]
 [10] _broadcast_getindex
    @ ./broadcast.jl:646 [inlined]
 [11] getindex
    @ ./broadcast.jl:605 [inlined]
 [12] copy
    @ ./broadcast.jl:906 [inlined]
 [13] materialize
    @ ./broadcast.jl:867 [inlined]
 [14] broadcast_forward
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:282 [inlined]
 [15] _broadcast_generic
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:212 [inlined]
 [16] adjoint
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:205 [inlined]
 [17] _pullback(::Zygote.Context{…}, ::typeof(Base.Broadcast.broadcasted), ::Base.Broadcast.DefaultArrayStyle{…}, ::typeof(jn), ::Matrix{…}, ::Base.ReshapedArray{…})
    @ Zygote ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67
 [18] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:946
 [19] adjoint
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/lib.jl:203 [inlined]
 [20] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [21] broadcasted
    @ ./broadcast.jl:1326 [inlined]

gdalle · October 30, 2024, 9:56am

I think the first issue is a faulty dispatch inside Optimization.jl.

For the second one, you shouldn’t actually need to define overloads with n being a Dual, because you’re never going to differentiate with respect to n. Presumably it’s this second, uneeded Dual which causes tag confusion.

Hareruya · October 30, 2024, 12:55pm

A shame about the faulty dispatch, seems like I will have to deal with the warnings. I can suppress the warning, so this should be solved.

I agree I shouldn’t pass it as a dual argument. However, regardless of whether I should make the definition or not, ForwardDiff is still trying to evaluate it whenever I broadcast on that argument. I ran into issues with ForwardDiff trying to evaluate f(::Dual, ::Dual) and breaking the code, hence the implementation. This one I can easily create a mwe for:

using Zygote

Ords = 0:5

function g(x::Number, n::Integer)
       return x*n
end

Zygote.hessian(x->sum(g.(x, Ords)), 0.0)

which generates

ERROR: MethodError: no method matching g(::ForwardDiff.Dual{Nothing, ForwardDiff.Dual{…}, 2}, ::ForwardDiff.Dual{Nothing, Int64, 2})
The function `g` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  g(::Number, ::Integer)
   @ Main REPL[3]:1

Stacktrace:
  [1] (::Zygote.var"#1410#1411"{typeof(g)})(::ForwardDiff.Dual{ForwardDiff.Tag{Zygote.var"#130#131"{var"#1#2"}, Float64}, Float64, 1}, ::Int64)
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:276
  [2] _broadcast_getindex_evalf
    @ ./broadcast.jl:673 [inlined]
  [3] _broadcast_getindex
    @ ./broadcast.jl:646 [inlined]
  [4] getindex
    @ ./broadcast.jl:605 [inlined]
  [5] copy
    @ ./broadcast.jl:906 [inlined]
  [6] materialize
    @ ./broadcast.jl:867 [inlined]
  [7] broadcast_forward
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:282 [inlined]
  [8] _broadcast_generic
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:212 [inlined]
  [9] adjoint
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/broadcast.jl:205 [inlined]
 [10] _pullback(::Zygote.Context{false}, ::typeof(Base.Broadcast.broadcasted), ::Base.Broadcast.DefaultArrayStyle{1}, ::typeof(g), ::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 1}, ::UnitRange{Int64})
    @ Zygote ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67
 [11] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:946
 [12] adjoint
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/lib.jl:203 [inlined]
 [13] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [14] broadcasted
    @ ./broadcast.jl:1326 [inlined]
 [15] _pullback(::Zygote.Context{false}, ::typeof(Base.Broadcast.broadcasted), ::typeof(g), ::ForwardDiff.Dual{ForwardDiff.Tag{Zygote.var"#130#131"{var"#1#2"}, Float64}, Float64, 1}, ::UnitRange{Int64})
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
 [16] #1
    @ ./REPL[4]:1 [inlined]
 [17] _pullback(ctx::Zygote.Context{false}, f::var"#1#2", args::ForwardDiff.Dual{ForwardDiff.Tag{Zygote.var"#130#131"{var"#1#2"}, Float64}, Float64, 1})
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
 [18] pullback(f::Function, cx::Zygote.Context{false}, args::ForwardDiff.Dual{ForwardDiff.Tag{Zygote.var"#130#131"{var"#1#2"}, Float64}, Float64, 1})
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface.jl:90
 [19] pullback
    @ ~/.julia/packages/Zygote/NRp5C/src/compiler/interface.jl:88 [inlined]
 [20] gradient(f::Function, args::ForwardDiff.Dual{ForwardDiff.Tag{Zygote.var"#130#131"{var"#1#2"}, Float64}, Float64, 1})
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface.jl:147
 [21] #130
    @ ~/.julia/packages/Zygote/NRp5C/src/lib/grad.jl:66 [inlined]
 [22] derivative(f::Zygote.var"#130#131"{var"#1#2"}, x::Float64)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/derivative.jl:14
 [23] hessian_dual(f::Function, x::Float64)
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/grad.jl:66
 [24] hessian(f::Function, x::Float64)
    @ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/grad.jl:62
 [25] top-level scope
    @ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types.

EDIT: defining n as a kwarg and wrapping the function into 2 anonymous functions seems to solve this, but I cannot imagine this is an intended solution:

Zygote.hessian(x->sum((ord-> g(x, n=ord)).(Ords)), 1.0)
0.0

gdalle · October 30, 2024, 3:54pm

Maybe you can just relax the constraint of being an integer?

Hareruya · October 30, 2024, 7:20pm

The problem is that the derivative for n for my actual code outside the toy example isn’t simple like that. It is calculable if not an integer but it is not very tractable.
I suppose since I will never differentiate in that parameter it should be fine to make the gradient always 0? I think I finally made it stop giving me these errors by defining a scalar rule for the Zygote gradient and covering every possible case relevant to me exhaustively for the ForwardDiff part. Now the Hessian is calculated without issue. It is benchmarking somewhat close to the gradient too, so I think this is the best I can do for now.

Marking your first reply as the solution, thank you very much for the assistance.

gdalle · October 30, 2024, 7:25pm

I think that is not an issue because ForwardDiff will implicitly create a Dual with values of the same type as the input, here integers. Watch:

julia> function g(x, n)
           @show n
           return x * n
       end
g (generic function with 1 method)

julia> Zygote.hessian(x -> sum(g.(x, Ords)), 0.0)
n = Dual{Nothing}(0,0,1)
n = Dual{Nothing}(1,0,1)
n = Dual{Nothing}(2,0,1)
n = Dual{Nothing}(3,0,1)
n = Dual{Nothing}(4,0,1)
n = Dual{Nothing}(5,0,1)
0.0

No need to define a custom Dual overload here, and no need to call g(x, n) for non-integer n either.

Hareruya · October 30, 2024, 7:39pm

The overload is unfortunately necessary for other reasons, the toy example was just to show what was causing the problems. The dual overload was necessary because in the actual code f uses a function that is differentiable on paper but the implementation I am using has no implementation for Dual.

Topic		Replies	Views
Compute gradient of gradient norm using zygote New to Julia zygote	17	2025	August 26, 2022
Warning secondorder ADtype in optimization.jl New to Julia differentiation , optimization	3	152	November 12, 2024
Hessian of function involving integration of ODE Machine Learning question	2	285	September 29, 2023
Mutation error in Zygote hessian Machine Learning question , package	2	184	June 22, 2023
Optimization.jl, DataInterpolations.jl and Gradients General Usage interpolations , gradient , data-interpolations	4	779	April 21, 2023

Odd warning and issues with optimization only when inside Optimization.jl. Zygote.hessian works fine

Related topics