Solving nonlinear equations using dual numbers

How can I use automated differentiation with NonlinearSolve?

I thought it is sufficient to use initial values that are dual numbers. This is not working:

if DUAL
    state_vec = ForwardDiff.Dual[Dual(state_vec[1]), Dual(state_vec[2]), Dual(state_vec[3])]
end

How can I convert a state vector of Float64 values to a vector of dual numbers to feed the solver with this as initial state?

Is it a problem that the first element is nothing?

julia> state_vec
3-element Vector{ForwardDiff.Dual}:
      Dual{Nothing}(0.3217505543966422)
     Dual{Nothing}(-0.3062773691696694)
 Dual{Nothing}(160941.38441492728)

Note that implicit differentiation should be much more efficient that trying to propagate the chain rule through the steps of a nonlinear-solver algorithm, even if you get the latter to work.

That is, if you are solving f(x, y)=0 for a root x, which is an implicit function x(y), then the Jacobian x'(y) is given by x'(y) = -(\partial f / \partial x)^{-1} (\partial f / \partial y). You can presumably compute the Jacobians \partial f / \partial x and \partial f / \partial y (both evaluated at the root x) by AD once you have found the root.

It seems NonlinearSolve.jl has dual number overloads, and I would assume that they are here precisely to perform implicit differentiation? At least that’s what their paper suggests.

Why create the vector of dual numbers manually? If you use something like ForwardDiff.derivative or ForwardDiff.gradient with your solve, it will be created for you.

This is already done, the user shouldn’t do anything directly.

These dual numbers don’t have a valid tag, so they won’t work will with some deeper parts of ForwardDiff. Define them with a proper tag. I assume what you’re trying to do is:

The problem is, it does not work. Test:

using BenchmarkTools, ForwardDiff

DUAL = true

# Read the initial conditions from a .mat file
state_vec, kite_pos, kite_vel, wind_vel, tether_length, settings = get_initial_conditions("test/data/input_basic_test.mat")
if DUAL
    state_vec = ForwardDiff.Dual[Dual(state_vec[1]), Dual(state_vec[2]), Dual(state_vec[3])]
end
simulate_tether(state_vec, kite_pos, kite_vel, wind_vel, tether_length, settings; prn=true, dual=DUAL)

Simulation function:

function simulate_tether(state_vec, kite_pos, kite_vel, wind_vel, tether_length, settings; prn=false, dual=false)
    Ns = size(wind_vel, 2)
    if dual
        buffers= [zeros(Dual, 3, Ns), zeros(Dual, 3, Ns), zeros(Dual, 3, Ns), zeros(Dual, 3, Ns), zeros(Dual, 3, Ns)]
    else
        buffers= [zeros(3, Ns), zeros(3, Ns), zeros(3, Ns), zeros(3, Ns), zeros(3, Ns)]
    end
    
    # Pack parameters in param named tuple - false sets res! for in-place solution
    param = (kite_pos=kite_pos, kite_vel=kite_vel, wind_vel=wind_vel, 
         tether_length=tether_length, settings=settings, buffers=buffers, 
         returnFlag=false)
    # Define the nonlinear problem
    prob = NonlinearProblem(res!, state_vec, param)
    # Solve the problem with TrustRegion method
    if dual
        sol = solve(prob, TrustRegion(); show_trace = Val(false))
    else
        sol = solve(prob, TrustRegion(autodiff=AutoFiniteDiff()); show_trace = Val(false))
    end 

    iterations = sol.stats.nsteps  # Field name may vary; verify with `propertynames(sol)`
    state_vec = sol.u
    if prn
        println("Iterations: ", iterations)
    end

    # Set the returnFlag to true so that res! returns outputs
    param = (; param..., returnFlag=true)
    res = zeros(3)
    res, force_kite, tether_pos, p0 = res!(res, state_vec, param)

    force_gnd = state_vec[3]
    state_vec, tether_pos, force_gnd, force_kite, p0
end

Callback function:

function res!(res, state_vec, param)
    function cross1(a::Dual, b)
        cross(a, b)
    end
    function cross1(a, b::Dual)
        cross(a, b)
    end
    function cross1(a, b)
        cross(MVec3(a), MVec3(b))
    end
    kite_pos, kite_vel, wind_vel, tether_length, settings, buffers, returnFlag = param
    base_type = typeof(state_vec[1])
    g = abs(settings.g_earth[3])
    Ns = size(wind_vel, 2)
    Ls = tether_length / (Ns + 1)
    mj = settings.rho_tether * Ls
    drag_coeff = -0.5 * settings.rho * Ls * settings.d_tether * settings.cd_tether
    A = Ο€/4 * (settings.d_tether/1000)^2
    E = settings.c_spring / A

    # Preallocate arrays
    FT = buffers[1]
    Fd = buffers[2]
    pj = buffers[3]
    vj = buffers[4]
    aj = buffers[5]

    # Unpack state variables
    ΞΈ, Ο†, Tn = state_vec[1], state_vec[2], state_vec[3]

    # Precompute common values
    sinΞΈ = sin(ΞΈ)
    cosΞΈ = cos(ΞΈ)
    sinφ = sin(φ)
    cosφ = cos(φ)
    norm_p = norm(kite_pos)
    p_unit = kite_pos ./ norm_p
    v_parallel = dot(kite_vel, p_unit)
    
    # First element calculations
    FT[1, Ns] = Tn * sinθ * cosφ
    FT[2, Ns] = Tn * sinφ
    FT[3, Ns] = Tn * cosθ * cosφ

    pj[1, Ns] = Ls * sinθ * cosφ
    pj[2, Ns] = Ls * sinφ
    pj[3, Ns] = Ls * cosθ * cosφ

    # Velocity and acceleration calculations
    Ο‰ = cross1(kite_pos / norm_p^2, kite_vel) # 3 alloc
    a = cross1(Ο‰, @view(pj[:, Ns]))         # 3 alloc
    b = cross1(Ο‰, cross1(Ο‰, @view(pj[:, Ns])))
    vj[:, Ns] .= v_parallel * p_unit + a
    aj[:, Ns] .= b

    # Drag calculation for first element
    v_a_p1 = vj[1, Ns] - wind_vel[1, Ns]
    v_a_p2 = vj[2, Ns] - wind_vel[2, Ns]
    v_a_p3 = vj[3, Ns] - wind_vel[3, Ns]

    if all(x -> abs(x) < 1e-3, (v_a_p1, v_a_p2, v_a_p3))
        Fd[:, Ns] .= zero(base_type)
    else
        dir1, dir2, dir3 = pj[1, Ns]/Ls, pj[2, Ns]/Ls, pj[3, Ns]/Ls
        v_dot_dir = v_a_p1*dir1 + v_a_p2*dir2 + v_a_p3*dir3
        v_a_p_t1 = v_dot_dir * dir1
        v_a_p_t2 = v_dot_dir * dir2
        v_a_p_t3 = v_dot_dir * dir3

        v_a_p_n1 = v_a_p1 - v_a_p_t1
        v_a_p_n2 = v_a_p2 - v_a_p_t2
        v_a_p_n3 = v_a_p3 - v_a_p_t3

        norm_v_a_p_n = sqrt(v_a_p_n1^2 + v_a_p_n2^2 + v_a_p_n3^2)
        coeff = drag_coeff * norm_v_a_p_n

        Fd[1, Ns] = coeff * v_a_p_n1
        Fd[2, Ns] = coeff * v_a_p_n2
        Fd[3, Ns] = coeff * v_a_p_n3
    end

    # Process other segments
    @inbounds for ii in Ns:-1:2
        # Tension force calculations
        if ii == Ns
            mj_total = 1.5mj
            g_term = mj_total * g
        else
            mj_total = mj
            g_term = mj * g
        end

        for k in 1:3
            FT[k, ii-1] = mj_total * aj[k, ii] + FT[k, ii] - Fd[k, ii]
        end
        FT[3, ii-1] += g_term

        # Position calculations
        ft_norm = sqrt(FT[1, ii-1]^2 + FT[2, ii-1]^2 + FT[3, ii-1]^2)
        l_i_1 = (ft_norm/(E*A) + 1) * Ls
        ft_dir = FT[1, ii-1]/ft_norm, FT[2, ii-1]/ft_norm, FT[3, ii-1]/ft_norm

        pj[1, ii-1] = pj[1, ii] + l_i_1 * ft_dir[1]
        pj[2, ii-1] = pj[2, ii] + l_i_1 * ft_dir[2]
        pj[3, ii-1] = pj[3, ii] + l_i_1 * ft_dir[3]

        # Velocity and acceleration
        a = cross1(Ο‰, @view(pj[:, ii-1]))           # 28 allocations
        b = cross1(Ο‰, cross1(Ο‰, @view(pj[:, ii-1]))) # 28 allocations
        vj[:, ii-1] .= v_parallel * p_unit + a
        aj[:, ii-1] .= b

        # Drag calculations
        v_a_p1 = vj[1, ii] - wind_vel[1, ii]
        v_a_p2 = vj[2, ii] - wind_vel[2, ii]
        v_a_p3 = vj[3, ii] - wind_vel[3, ii]

        if all(x -> abs(x) < 1e-3, (v_a_p1, v_a_p2, v_a_p3))
            Fd[:, ii-1] .= zero(base_type)
        else
            dx = pj[1, ii-1] - pj[1, ii]
            dy = pj[2, ii-1] - pj[2, ii]
            dz = pj[3, ii-1] - pj[3, ii]
            segment_norm = sqrt(dx^2 + dy^2 + dz^2)
            dir1 = dx/segment_norm
            dir2 = dy/segment_norm
            dir3 = dz/segment_norm

            v_dot_dir = v_a_p1*dir1 + v_a_p2*dir2 + v_a_p3*dir3
            v_a_p_t1 = v_dot_dir * dir1
            v_a_p_t2 = v_dot_dir * dir2
            v_a_p_t3 = v_dot_dir * dir3

            v_a_p_n1 = v_a_p1 - v_a_p_t1
            v_a_p_n2 = v_a_p2 - v_a_p_t2
            v_a_p_n3 = v_a_p3 - v_a_p_t3

            norm_v_a_p_n = sqrt(v_a_p_n1^2 + v_a_p_n2^2 + v_a_p_n3^2)
            coeff = drag_coeff * norm_v_a_p_n

            Fd[1, ii-1] = coeff * v_a_p_n1
            Fd[2, ii-1] = coeff * v_a_p_n2
            Fd[3, ii-1] = coeff * v_a_p_n3
        end
    end

    # Final ground connection calculations
    T0_1 = 1.5mj*aj[1,1] + FT[1,1] - Fd[1,1]
    T0_2 = 1.5mj*aj[2,1] + FT[2,1] - Fd[2,1]
    T0_3 = 1.5mj*aj[3,1] + FT[3,1] - Fd[3,1] + 1.5mj*g
    T0_norm = sqrt(T0_1^2 + T0_2^2 + T0_3^2)
    
    l_i_1 = (T0_norm/(E*A) + 1) * Ls
    T0_dir1 = T0_1/T0_norm
    T0_dir2 = T0_2/T0_norm
    T0_dir3 = T0_3/T0_norm

    p0 = [pj[1,1] + l_i_1*T0_dir1, 
          pj[2,1] + l_i_1*T0_dir2,
          pj[3,1] + l_i_1*T0_dir3]

    res .= kite_pos - p0
    if returnFlag
        return res, MVector(T0_1, T0_2, T0_3), pj, p0
    else
        nothing
    end
end

Error message:

julia> include("examples_quasistatic/benchmark_qsm.jl")
ERROR: LoadError: MethodError: no method matching Float64(::ForwardDiff.Dual{Nothing, Float64, 0})
The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.

Stack trace:

Stacktrace:
  [1] convert(::Type{Float64}, x::ForwardDiff.Dual{Nothing, Float64, 0})
    @ Base ./number.jl:7
  [2] macro expansion
    @ ~/.julia/packages/StaticArraysCore/7xxEJ/src/StaticArraysCore.jl:88 [inlined]
  [3] convert_ntuple
    @ ~/.julia/packages/StaticArraysCore/7xxEJ/src/StaticArraysCore.jl:84 [inlined]
  [4] MVector{3, Float64}(x::Tuple{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, ForwardDiff.Dual{…}})
    @ StaticArraysCore ~/.julia/packages/StaticArraysCore/7xxEJ/src/StaticArraysCore.jl:192
  [5] convert
    @ ~/.julia/packages/StaticArrays/LSPcF/src/convert.jl:212 [inlined]
  [6] StaticArray
    @ ~/.julia/packages/StaticArrays/LSPcF/src/convert.jl:182 [inlined]
  [7] (::var"#cross1#22")(a::MVector{3, Float64}, b::SubArray{ForwardDiff.Dual, 1, Matrix{…}, Tuple{…}, true})
    @ Main ~/repos/Tethers.jl/src/Tether_quasistatic.jl:129
  [8] res!(res::Vector{…}, state_vec::Vector{…}, param::@NamedTuple{…})
    @ Main ~/repos/Tethers.jl/src/Tether_quasistatic.jl:171
  [9] (::NonlinearFunction{…})(::Vector{…}, ::Vararg{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/sYmAV/src/scimlfunctions.jl:2469
 [10] evaluate_f(prob::NonlinearProblem{…}, u::Vector{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/sBl1H/src/internal/helpers.jl:6
 [11] __init(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Nothing, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::Function, linsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generalized_first_order.jl:166
 [12] __solve(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generic.jl:3
 [13] solve_call(_prob::NonlinearProblem{…}, args::GeneralizedFirstOrderAlgorithm{…}; merge_callbacks::Bool, kwargshandle::Nothing, kwargs::@Kwargs{…})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:635
 [14] solve_up(prob::NonlinearProblem{…}, sensealg::Nothing, u0::Vector{…}, p::@NamedTuple{…}, args::GeneralizedFirstOrderAlgorithm{…}; kwargs::@Kwargs{…})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1112
 [15] solve(prob::NonlinearProblem{…}, args::GeneralizedFirstOrderAlgorithm{…}; sensealg::Nothing, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{…})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1100
 [16] simulate_tether(state_vec::Vector{…}, kite_pos::MVector{…}, kite_vel::MVector{…}, wind_vel::Matrix{…}, tether_length::Float64, settings::Settings; prn::Bool, dual::Bool)
    @ Main ~/repos/Tethers.jl/src/Tether_quasistatic.jl:67
 [17] top-level scope
    @ ~/repos/Tethers.jl/examples_quasistatic/benchmark_qsm.jl:12
 [18] include(fname::String)
    @ Main ./sysimg.jl:38
 [19] top-level scope
    @ REPL[7]:1
in expression starting at /home/ufechner/repos/Tethers.jl/examples_quasistatic/benchmark_qsm.jl:12
Some type information was truncated. Use `show(err)` to see complete types.

The stack trace does not really help. It does not tell me where in the callback function an error is happening.

The stack trace suggests that the third method of cross ([7] (::var"#cross1#22")(a::MVector{3, Float64}, ...) is hit and the MVec3 call ([4] MVector{3, Float64}(x::Tuple{ForwardDiff.Dual{…}, ...) therein fails.
Is MVec3 your function? How is it defined, i.e., does it work for types besides Float64?

Thanks! I can read the stack traces now better.

But I still don’t understand how to formulate a NonLinear problem such that I can use automated differentiation.

I have a residual function with the following signature:

function res!(res, state_vec, param)

At the end of the function the argument res is updated using:

res .= kite_pos - p0

The function returns nothing.

It is used when defining the problem:

    # Define the nonlinear problem
    prob = NonlinearProblem(res!, state_vec, param)
    # Solve the problem with TrustRegion method
    sol = solve(prob, TrustRegion(); show_trace = Val(false))

If the state vector is initialized with zeros(3), then the last line of the res! function fails with:

julia> include("examples_quasistatic/benchmark_qsm_dual.jl")
β”Œ Warning: Using arrays or dicts to store parameters of different types can hurt performance.
β”‚ Consider using tuples instead.
β”” @ SciMLBase ~/.julia/packages/SciMLBase/sYmAV/src/performance_warnings.jl:33
ERROR: LoadError: MethodError: no method matching Float64(::ForwardDiff.Dual{Nothing, Float64, 0})
The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
   @ Base rounding.jl:265
  (::Type{T})(::T) where T<:Number
   @ Core boot.jl:900
  Float64(::UInt8)
   @ Base float.jl:245
  ...

Stacktrace:
  [1] convert(::Type{Float64}, x::ForwardDiff.Dual{Nothing, Float64, 0})
    @ Base ./number.jl:7
  [2] setindex!
    @ ~/.julia/packages/StaticArrays/LSPcF/src/MArray.jl:35 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/StaticArrays/LSPcF/src/broadcast.jl:159 [inlined]
  [4] _broadcast!
    @ ~/.julia/packages/StaticArrays/LSPcF/src/broadcast.jl:143 [inlined]
  [5] _copyto!
    @ ~/.julia/packages/StaticArrays/LSPcF/src/broadcast.jl:70 [inlined]
  [6] copyto!
    @ ~/.julia/packages/StaticArrays/LSPcF/src/broadcast.jl:63 [inlined]
  [7] materialize!
    @ ./broadcast.jl:883 [inlined]
  [8] materialize!(dest::MVector{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast ./broadcast.jl:880
  [9] res!(res::MVector{…}, state_vec::MVector{…}, param::@NamedTuple{…})
    @ Main ~/repos/Tethers.jl/src/Tether_qsm_dual.jl:265
 [10] NonlinearFunction
    @ ~/.julia/packages/SciMLBase/sYmAV/src/scimlfunctions.jl:2469 [inlined]
 [11] evaluate_f
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/internal/helpers.jl:6 [inlined]
 [12] __init(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Nothing, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::Function, linsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generalized_first_order.jl:166
 [13] __init
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generalized_first_order.jl:156 [inlined]
 [14] #__solve#130
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generic.jl:3 [inlined]
 [15] __solve
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generic.jl:1 [inlined]
 [16] #solve_call#35
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:635 [inlined]
 [17] solve_call
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:592 [inlined]
 [18] #solve_up#44
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1112 [inlined]
 [19] solve_up
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1106 [inlined]
 [20] #solve#43
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1100 [inlined]
 [21] simulate_tether(state_vec::MVector{…}, kite_pos::MVector{…}, kite_vel::MVector{…}, wind_vel::Matrix{…}, tether_length::Float64, settings::Settings; prn::Bool)
    @ Main ~/repos/Tethers.jl/src/Tether_qsm_dual.jl:63
 [22] top-level scope
    @ ~/repos/Tethers.jl/examples_quasistatic/benchmark_qsm_dual.jl:7
 [23] include(fname::String)
    @ Main ./sysimg.jl:38
 [24] top-level scope
    @ REPL[1]:1
in expression starting at /home/ufechner/repos/Tethers.jl/examples_quasistatic/benchmark_qsm_dual.jl:7
Some type information was truncated. Use `show(err)` to see complete types.

You can find the complete source code here: Tethers.jl/src/Tether_qsm_dual.jl at andrea_quasistatic Β· ufechner7/Tethers.jl Β· GitHub

How can I initialize the state vector such that this error is avoided?

I use now:

    # Define the nonlinear problem
    dual_vec = Dual.(state_vec, 0)
    prob = NonlinearProblem(res!, dual_vec, param)
    # Solve the problem with TrustRegion method
    sol = solve(prob, TrustRegion(); show_trace = Val(false)) 

and get a different error:

julia> include("examples_quasistatic/benchmark_qsm_dual.jl")
ERROR: LoadError: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{DifferentiationInterface.FixTail{NonlinearFunction{true, SciMLBase.FullSpecialize, typeof(res!), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing}, Tuple{@NamedTuple{kite_pos::MVector{3, Float64}, kite_vel::MVector{3, Float64}, wind_vel::Matrix{Float64}, tether_length::Float64, settings::Settings, buffers::Vector{Matrix{ForwardDiff.Dual}}, returnFlag::Bool}}}, ForwardDiff.Dual{Nothing, Float64, 1}}
Stacktrace:
  [1] β‰Ί(a::Type, b::Type)
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/dual.jl:54
  [2] -(x::ForwardDiff.Dual{ForwardDiff.Tag{…}, ForwardDiff.Dual{…}, 3}, y::ForwardDiff.Dual{Nothing, Float64, 0})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/dual.jl:144
  [3] res!(res::MVector{…}, state_vec::MVector{…}, param::@NamedTuple{…})
    @ Main ~/repos/Tethers.jl/src/Tether_qsm_dual.jl:198
  [4] NonlinearFunction
    @ ~/.julia/packages/SciMLBase/sYmAV/src/scimlfunctions.jl:2469 [inlined]
  [5] FixTail
    @ ~/.julia/packages/DifferentiationInterface/qrWdQ/src/utils/context.jl:7 [inlined]
  [6] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/UBbGT/src/apiutils.jl:31 [inlined]
  [7] vector_mode_jacobian(f!::DifferentiationInterface.FixTail{…}, y::MVector{…}, x::MVector{…}, cfg::ForwardDiff.JacobianConfig{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/jacobian.jl:139
  [8] jacobian
    @ ~/.julia/packages/ForwardDiff/UBbGT/src/jacobian.jl:40 [inlined]
  [9] jacobian
    @ ~/.julia/packages/DifferentiationInterface/qrWdQ/ext/DifferentiationInterfaceForwardDiffExt/twoarg.jl:433 [inlined]
 [10] #JacobianCache#83
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/internal/jacobian.jl:89 [inlined]
 [11] JacobianCache
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/internal/jacobian.jl:47 [inlined]
 [12] __init(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Nothing, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::Function, linsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{…})
    @ NonlinearSolve ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generalized_first_order.jl:175
 [13] __init
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generalized_first_order.jl:156 [inlined]
 [14] #__solve#130
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generic.jl:3 [inlined]
 [15] __solve
    @ ~/.julia/packages/NonlinearSolve/sBl1H/src/core/generic.jl:1 [inlined]
 [16] #solve_call#35
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:635 [inlined]
 [17] solve_call
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:592 [inlined]
 [18] #solve_up#44
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1112 [inlined]
 [19] solve_up
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1106 [inlined]
 [20] #solve#43
    @ ~/.julia/packages/DiffEqBase/PbBEl/src/solve.jl:1100 [inlined]
 [21] simulate_tether(state_vec::MVector{…}, kite_pos::MVector{…}, kite_vel::MVector{…}, wind_vel::Matrix{…}, tether_length::Float64, settings::Settings; prn::Bool)
    @ Main ~/repos/Tethers.jl/src/Tether_qsm_dual.jl:64
 [22] top-level scope
    @ ~/repos/Tethers.jl/examples_quasistatic/benchmark_qsm_dual.jl:7
 [23] include(fname::String)
    @ Main ./sysimg.jl:38
 [24] top-level scope
    @ REPL[10]:1
in expression starting at /home/ufechner/repos/Tethers.jl/examples_quasistatic/benchmark_qsm_dual.jl:7
Some type information was truncated. Use `show(err)` to see complete types.

That’s the point of NonlinearSolve.jl: you usually don’t have to do anything specific. If you have a function y = f(x) and at some point along the function you solve a linear system, you can just call e.g. ForwardDiff.jacobian(f, x) and it should work. As a rule of thumb, if you create Dual numbers manually, you’re probably doing something unnecessary.
That being said, you may run into trouble when your function copies data from the input x into some pre-allocated caches, because during evaluation its Float64 entries are replaced by Dual numbers. For such situations, use PreallocationTools.jl to create caches that can automatically adapt to the element type.

2 Likes

This is just not true. If you write your callback function in a non-allocating way you will always need to initialize your state with a vector of dual numbers. I got no suggestion or example so far how to do that.

This is the suggestion on how to do that :slight_smile:

2 Likes

Ok, got an example from Perplexity: https://www.perplexity.ai/search/can-you-give-an-example-how-to-0O3RQGECSEOdC_WrEgYFTw

using NonlinearSolve, PreallocationTools

# Create cache for temporary arrays using DiffCache
const cache = DiffCache(zeros(3))  # Preallocate for 3-element vector

function nonlinear_system!(f, u, p)
    # Retrieve type-stable temporary buffer
    tmp = get_tmp(cache, u)
    
    # Use temporary array for intermediate calculations
    tmp[1] = u[1] * u[2] - p[1]
    tmp[2] = u[2]^2 - u[3]
    tmp[3] = exp(u[1]) - u[3]
    
    # Perform in-place operations using temporary buffer
    f[1] = tmp[1] + tmp[2] - 2.5
    f[2] = tmp[3] * tmp[1] - 1.0
    f[3] = sum(tmp) - 3.0
end

# Parameter and initial guess
p = [2.0]
u0 = ones(3)

# Create problem with cache as parameter
prob = NonlinearProblem{true}(nonlinear_system!, u0, p)
sol = solve(prob, NewtonRaphson())

println("Solution: ", sol.u)

Interesting: The initial guess can be a vector of Float64. I did not expect that.

Judging by the PreallocationTools.jl README, declaring DiffCaches as const (as Perplexity β€œsuggests”) is not necessary, they can be passed as parameters instead. If it is true for ODEs, I assume it also holds for nonlinear systems.

I have a working version now: Tethers.jl/src/Tether_qsm_dual.jl at andrea_quasistatic Β· ufechner7/Tethers.jl Β· GitHub

@gdalle Thank you so much!

Open question:
I use

 buffers = [DiffCache(zeros(3, Ns)), DiffCache(zeros(3, Ns)), DiffCache(zeros(3, Ns)), DiffCache(zeros(3, Ns)), DiffCache(zeros(3, Ns)), ]

to create the cache, and then

    # Get preallocated arrays from the cache
    FT = get_tmp(buffers[1], state_vec)
    Fd = get_tmp(buffers[2], state_vec)
    pj = get_tmp(buffers[3], state_vec)
    vj = get_tmp(buffers[4], state_vec)
    aj = get_tmp(buffers[5], state_vec)

to access it. Is it needed to create five caches for this purpose, or could I also just use one cache and call five times the get function?

While it works now, the version that uses finite differences and MVectors is still much faster than the version that uses dual numbers:

This version, using automated differentiation:

# Iterations: 36
# BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
# Range (min … max):  115.619 ΞΌs …   8.848 ms  β”Š GC (min … max): 0.00% … 97.10%
# Time  (median):     153.630 ΞΌs               β”Š GC (median):    0.00%
# Time  (mean Β± Οƒ):   162.871 ΞΌs Β± 168.648 ΞΌs  β”Š GC (mean Β± Οƒ):  6.08% Β±  6.58%

#     ▁           β–β–ƒβ–ƒβ–„β–„β–‡β–‡β–ˆβ–ˆβ–†β–‡β–†β–„β–…β–„β–…β–ƒβ–ƒβ–                              
#   β–‚β–†β–ˆβ–‡β–…β–ƒβ–ƒβ–„β–„β–…β–…β–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–„β–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–„
#   121 ΞΌs           Histogram: frequency by time          233 ΞΌs <

#  Memory estimate: 202.27 KiB, allocs estimate: 3687.

Version using FiniteDifferences and MVectors:

# Iterations: 36
# BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
#  Range (min … max):   95.169 ΞΌs …  3.417 ms  β”Š GC (min … max): 0.00% … 95.50%
#  Time  (median):     103.330 ΞΌs              β”Š GC (median):    0.00%
#  Time  (mean Β± Οƒ):   106.198 ΞΌs Β± 59.810 ΞΌs  β”Š GC (mean Β± Οƒ):  1.10% Β±  1.90%

#     β–ƒβ–ˆβ–†β–‚                                                        
#   β–β–„β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–†β–†β–†β–†β–†β–†β–…β–…β–†β–…β–…β–…β–…β–„β–„β–„β–„β–„β–…β–…β–…β–…β–†β–…β–†β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–‚β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β– β–ƒ
#   95.2 ΞΌs         Histogram: frequency by time          124 ΞΌs <

#  Memory estimate: 46.53 KiB, allocs estimate: 1118.

Is this to be expected?

Why not also use MVectors in your DiffCaches?

While this give a small improvement in performance it also requires to hardcode the size of the arrays which I would prefer to avoid:

    Ns = size(wind_vel, 2)
    buffers = (DiffCache(MMatrix{3, 15}(zeros(3, Ns))), DiffCache(MMatrix{3, 15}(zeros(3, Ns))), DiffCache(MMatrix{3, 15}(zeros(3, Ns))), DiffCache(MMatrix{3, 15}(zeros(3, Ns))),
               DiffCache(MMatrix{3, 15}(zeros(3, Ns))), DiffCache(MVec3(zeros(3))), DiffCache(MVec3(zeros(3))), DiffCache(MVec3(zeros(3))))

I cannot use Ns as second parameter of the MMatrix size.

This results in:

Iterations: 36
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  107.320 ΞΌs …   3.380 ms  β”Š GC (min … max):  0.00% … 93.13%
 Time  (median):     129.770 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   152.710 ΞΌs Β± 157.742 ΞΌs  β”Š GC (mean Β± Οƒ):  10.33% Β±  9.29%

  β–ˆβ–„                                                             
  β–ˆβ–ˆβ–ƒβ–„β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚β–‚ β–‚
  107 ΞΌs           Histogram: frequency by time         1.49 ms <

 Memory estimate: 328.67 KiB, allocs estimate: 2292.

In the code that uses finite differences I use normal arrays for the buffers.