Resolving Julia Warning: dt(2.27e-13) <= dtmin(2.27e-13) in implementation of dual numbers

Hi Julia Community,

I’ve been trying to make a glacier SIA2D forward model compatible with dual numbers using PreallocationTools.jl, however I run into the following warning using the solver RDPK3Sp35:

┌ Warning: dt(2.2737367544323206e-13) <= dtmin(2.2737367544323206e-13) at t=2010.097388613536, and step error estimate = 52082.68508677296. Aborting. There is either an error in your model specification or the true solution is unstable.
└ @ SciMLBase ~/.julia/packages/SciMLBase/8XHkk/src/integrator_interface.jl:599

I am using RDPK3Sp35 because with just float numbers this is benchmarked to be the best one for this problem. However this issue also arises when using other solvers.

Specifically when applying Rosenbrock23() as a solver method I get the following:

ERROR: First call to automatic differentiation for the Jacobian
failed. This means that the user `f` function is not compatible
with automatic differentiation. Methods to fix this include:

1. Turn off automatic differentiation (e.g. Rosenbrock23() becomes
   Rosenbrock23(autodiff=false)). More details can befound at
   https://docs.sciml.ai/DiffEqDocs/stable/features/performance_overloads/
2. Improving the compatibility of `f` with ForwardDiff.jl automatic 
   differentiation (using tools like PreallocationTools.jl). More details
   can be found at https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#Autodifferentiation-and-Dual-Numbers
3. Defining analytical Jacobians. More details can be
   found at https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODEFunction

Note: turning off automatic differentiation tends to have a very minimal
performance impact (for this use case, because it's forward mode for a
square Jacobian. This is different from optimization gradient scenarios).
However, one should be careful as some methods are more sensitive to
accurate gradients than others. Specifically, Rodas methods like `Rodas4`
and `Rodas5P` require accurate Jacobians in order to have good convergence,
while many other methods like BDF (`QNDF`, `FBDF`), SDIRK (`KenCarp4`),
and Rosenbrock-W (`Rosenbrock23`) do not. Thus if using an algorithm which
is sensitive to autodiff and solving at a low tolerance, please change the
algorithm as well.

And the following warning:

┌ Warning: The supplied DiffCache was too small and was enlarged. This incurs allocations
│     on the first call to `get_tmp`. If few calls to `get_tmp` occur and optimal performance is essential,
│     consider changing 'N'/chunk size of this DiffCache to 25.
└ @ PreallocationTools ~/.julia/packages/PreallocationTools/mJSsc/src/PreallocationTools.jl:151

This leads me to believe that my implementation is not correct. I did the following, to my Forward model:

iceflow_caches = (
        H̄ = DiffCache(model.iceflow.H̄),
        S = DiffCache(model.iceflow.S),
        D = DiffCache(model.iceflow.D),
        Dx = DiffCache(model.iceflow.Dx),
        Dy = DiffCache(model.iceflow.Dy),
        ∇S = DiffCache(model.iceflow.∇S),
        ∇Sx = DiffCache(model.iceflow.∇Sx),
        ∇Sy = DiffCache(model.iceflow.∇Sy),
        Fx = DiffCache(model.iceflow.Fx),
        Fy = DiffCache(model.iceflow.Fy),
        Fxx = DiffCache(model.iceflow.Fxx),
        Fyy = DiffCache(model.iceflow.Fyy),
        dSdx = DiffCache(model.iceflow.dSdx),
        dSdy = DiffCache(model.iceflow.dSdy),
        dSdx_edges = DiffCache(model.iceflow.dSdx_edges),
        dSdy_edges = DiffCache(model.iceflow.dSdy_edges)
    )

p = (simulation,iceflow_caches)

# Define problem to be solved
iceflow_prob = ODEProblem{true,SciMLBase.FullSpecialize}(du, model.iceflow.H, params.simulation.tspan, tstops=params.solver.tstops, p)
iceflow_sol = solve(iceflow_prob, 
                        params.solver.solver, 
                        callback=cb, 
                        tstops=params.solver.tstops, 
                        reltol=params.solver.reltol, 
                        save_everystep=params.solver.save_everystep, 
                        progress=params.solver.progress, 
                        progress_steps=params.solver.progress_steps)

function SIA2D!(dH::Matrix{<:Real}, H::Matrix{<:Real}, p, t::F) where {F <: Real}

    simulation = p[1]
    caches = p[2]
    
    # Retrieve parameters
    SIA2D_model::SIA2Dmodel = simulation.model.iceflow
    glacier::Sleipnir.Glacier2D = simulation.glaciers[simulation.model.iceflow.glacier_idx[]]
    params::Sleipnir.Parameters = simulation.parameters
        int_type = simulation.parameters.simulation.int_type
    H̄ = get_tmp(caches.H̄,H)
    A = SIA2D_model.A
    n = SIA2D_model.n
    B = glacier.B
    S = get_tmp(caches.S,H)
    dSdx = get_tmp(caches.dSdx,H)
    dSdy = get_tmp(caches.dSdy,H)
    D = get_tmp(caches.D,H)
    Dx = get_tmp(caches.Dx,H)
    Dy = get_tmp(caches.Dy,H)
    dSdx_edges = get_tmp(caches.dSdx_edges,H)
    dSdy_edges = get_tmp(caches.dSdy_edges,H)
    ∇S = get_tmp(caches.∇S,H)
    ∇Sx = get_tmp(caches.∇Sx,H)
    ∇Sy = get_tmp(caches.∇Sy,H)
    Fx = get_tmp(caches.Fx,H)
    Fy = get_tmp(caches.Fy,H)
    Fxx = get_tmp(caches.Fxx,H)
    Fyy = get_tmp(caches.Fyy,H)
    Δx::F = glacier.Δx
    Δy::F = glacier.Δy
    Γ = SIA2D_model.Γ
    ρ::F = simulation.parameters.physical.ρ
    g::F = simulation.parameters.physical.g

    # First, enforce values to be positive
    map!(x -> ifelse(x>0.0,x,0.0), H, H)
    
    # Update glacier surface altimetry
    S .= B .+ H

    # All grid variables computed in a staggered grid
    # Compute surface gradients on edges
    diff_x!(dSdx, S, Δx)  
    diff_y!(dSdy, S, Δy) 
    avg_y!(∇Sx, dSdx)
    avg_x!(∇Sy, dSdy)
    ∇S .= (∇Sx.^2 .+ ∇Sy.^2).^((n[] - 1)/2) 

    # @infiltrate
    avg!(H̄, H)
    Γ[] = 2.0 * A[] * (ρ * g)^n[] / (n[]+2) # 1 / m^3 s 
    D .= Γ[] .* H̄.^(n[] + 2) .* ∇S

    # Compute flux components
    @views diff_x!(dSdx_edges, S[:,2:end - 1], Δx)
    @views diff_y!(dSdy_edges, S[2:end - 1,:], Δy)

    # Cap surface elevaton differences with the upstream ice thickness to 
    # imporse boundary condition of the SIA equation
    η₀ = params.physical.η₀
    dSdx_edges .= @views @. min(dSdx_edges,  η₀ * H[2:end, 2:end-1]/Δx)
    dSdx_edges .= @views @. max(dSdx_edges, -η₀ * H[1:end-1, 2:end-1]/Δx)
    dSdy_edges .= @views @. min(dSdy_edges,  η₀ * H[2:end-1, 2:end]/Δy)
    dSdy_edges .= @views @. max(dSdy_edges, -η₀ * H[2:end-1, 1:end-1]/Δy)

    avg_y!(Dx, D)
    avg_x!(Dy, D)
    Fx .= .-Dx .* dSdx_edges
    Fy .= .-Dy .* dSdy_edges 

    #  Flux divergence
    diff_x!(Fxx, Fx, Δx)
    diff_y!(Fyy, Fy, Δy)
    inn(dH) .= .-(Fxx .+ Fyy) 
end

Any idea what I could be doing wrong?

For completeness some of the extra functions shown.

# Helper functions for the staggered grid

####  Non-allocating functions  ####

diff_x!(O, I, Δx) = @views @. O = (I[begin+1:end,:] - I[1:end-1,:]) / Δx

diff_y!(O, I, Δy) = @views @. O = (I[:,begin+1:end] - I[:,1:end - 1]) / Δy

avg!(O, I) = @views @. O = (I[1:end-1,1:end-1] + I[2:end,1:end-1] + I[1:end-1,2:end] + I[2:end,2:end]) * 0.25

avg_x!(O, I) = @views @. O = (I[1:end-1,:] + I[2:end,:]) * 0.5

avg_y!(O, I) = @views @. O = (I[:,1:end-1] + I[:,2:end]) * 0.5

####  Allocating functions  ####

"""
    avg(A)

4-point average of a matrix
"""
@views avg(A) = 0.25 .* ( A[1:end-1,1:end-1] .+ A[2:end,1:end-1] .+ A[1:end-1,2:end] .+ A[2:end,2:end] )


"""
    avg_x(A)

2-point average of a matrix's X axis
"""
@views avg_x(A) = 0.5 .* ( A[1:end-1,:] .+ A[2:end,:] )

"""
    avg_y(A)

2-point average of a matrix's Y axis
"""
@views avg_y(A) = 0.5 .* ( A[:,1:end-1] .+ A[:,2:end] )

"""
    diff_x(A)

2-point differential of a matrix's X axis
"""
@views diff_x(A) = (A[begin + 1:end, :] .- A[1:end - 1, :])

"""
    diff_y(A)

2-point differential of a matrix's Y axis
"""
@views diff_y(A) = (A[:, begin + 1:end] .- A[:, 1:end - 1])

"""
    inn(A)

Access the inner part of the matrix (-2,-2)
"""
@views inn(A) = A[2:end-1,2:end-1]

"""
    inn1(A)

Access the inner part of the matrix (-1,-1)
"""
@views inn1(A) = A[1:end-1,1:end-1]

Okay I have solved this issue by utilizing out of place code and properly applying DiffCache (I think).

using OrdinaryDiffEq, PreallocationTools, Optimization, OptimizationOptimJL, Huginn, Statistics, Infiltrator,Tullio,Plots,ForwardDiff

 #setting up parameters
    working_dir = joinpath(dirname(Base.current_project()), "data")
    if !ispath(working_dir)
        mkdir("data")
    end


    rgi_ids = ["RGI60-11.01450"]
    params = Parameters(OGGM = OGGMparameters(working_dir=working_dir,
                                                    multiprocessing=false,
                                                    workers=1,
                                                    ice_thickness_source = "Farinotti19"),
                                simulation = SimulationParameters(use_MB=true,
                                                                use_iceflow = true,
                                                                velocities = true,
                                                                use_glathida_data = true,
                                                                tspan=(2010.0, 2015.0),
                                                                working_dir = working_dir,
                                                                multiprocessing=false,
                                                                workers=1),
                                solver = SolverParameters(reltol=1e-10))

    glaciers = initialize_glaciers(rgi_ids, params)
    model = Model(
                iceflow = SIA2Dmodel(params), 
                mass_balance = TImodel1(params),
        )
        
    simulation = Prediction(model, glaciers, params)

    glacier_idx=1
    glacier = simulation.glaciers[glacier_idx]
        
    nx, ny = glacier.nx, glacier.ny 
    
    model.iceflow.n=glacier.n
    model.iceflow.glacier_idx = Ref{Int}(glacier_idx)
    model.iceflow.H₀ = deepcopy(glacier.H₀)
    model.iceflow.H  = deepcopy(glacier.H₀)
    model.iceflow.MB = zeros(Float64,nx,ny)
    model.iceflow.MB_mask= zeros(Float64,nx,ny)
    model.iceflow.MB_total = zeros(Float64,nx,ny)

    params.solver.tstops = Huginn.define_callback_steps(params.simulation.tspan, params.solver.step)
    stop_condition(u,t,integrator) = Sleipnir.stop_condition_tstops(u,t,integrator, params.solver.tstops) #closure
    
    function action!(integrator)
        if params.simulation.use_MB 
            # Compute mass balance
            MB_timestep!(model, glacier, params.solver.step, integrator.t)
            apply_MB_mask!(integrator.u, glacier, model.iceflow)
        end
    end

    cb_MB = DiscreteCallback(stop_condition, action!)

#forwardmodel
    function SIA2D(H, p, t)
        simulation=p[1]
        x=p[2]
    
        # function SIA2D(H, SIA2Dmodel)
            # Retrieve parameters
            SIA2D_model = simulation.model.iceflow
            glacier = simulation.glaciers[simulation.model.iceflow.glacier_idx[]]
            params = simulation.parameters

            # Retrieve parameters
            B = glacier.B
            Δx = glacier.Δx
            Δy = glacier.Δy
            A = x[1]
            n = SIA2D_model.n
            ρ = params.physical.ρ
            g = params.physical.g

            # Update glacier surface altimetry
            S = B .+ H
        
            # All grid variables computed in a staggered grid
            # Compute surface gradients on edges
            dSdx = Huginn.diff_x(S) ./ Δx
            dSdy = Huginn.diff_y(S) ./ Δy
            ∇S = (Huginn.avg_y(dSdx).^2 .+ Huginn.avg_x(dSdy).^2).^((n[] - 1)/2) 
            
            Γ = 2.0 * A[] * (ρ * g)^n[] / (n[]+2) # 1 / m^3 s 
            D = Γ .* Huginn.avg(H).^(n[] + 2) .* ∇S
        
            # Compute flux components
            @views dSdx_edges = Huginn.diff_x(S[:,2:end - 1]) ./ Δx
            @views dSdy_edges = Huginn.diff_y(S[2:end - 1,:]) ./ Δy
        

            η₀ = 1.0
            dSdx_edges .= @views @. min(dSdx_edges,  η₀ * H[2:end, 2:end-1]/Δx)
            dSdx_edges .= @views @. max(dSdx_edges, -η₀ * H[1:end-1, 2:end-1]/Δx)
            dSdy_edges .= @views @. min(dSdy_edges,  η₀ * H[2:end-1, 2:end]/Δy)
            dSdy_edges .= @views @. max(dSdy_edges, -η₀ * H[2:end-1, 1:end-1]/Δy)
        
            Fx = .-Huginn.avg_y(D) .* dSdx_edges
            Fy = .-Huginn.avg_x(D) .* dSdy_edges 
        
            #  Flux divergence
            @tullio dH[i,j] := -(Huginn.diff_x(Fx)[pad(i-1,1,1),pad(j-1,1,1)] / Δx + Huginn.diff_y(Fy)[pad(i-1,1,1),pad(j-1,1,1)] / Δy) 
        
            return dH
    end


    realsol = glacier.H_glathida

#optimization
    function objfun(x, simulation, realsol, cache)
        
        #model.iceflow.A[]=x[1]
        #model.iceflow.n[]=x[2]
        
        iceflow_prob = ODEProblem{false}(SIA2D, get_tmp(cache,x[1]), params.simulation.tspan, tstops=params.solver.tstops, (simulation,x))
        iceflow_sol = solve(iceflow_prob, 
                        params.solver.solver, 
                        callback=cb_MB, 
                        tstops=params.solver.tstops, 
                        reltol=params.solver.reltol, 
                        abstol=params.solver.reltol,
                        save_everystep=params.solver.save_everystep, 
                        progress=params.solver.progress, 
                        progress_steps=params.solver.progress_steps)


        sol = iceflow_sol.u[end]
        println(typeof(sol))
        
        ofv = 0.0
        if any((s.retcode != :Success for s in iceflow_sol))
            ofv = 1e12
        else
            ofv = sum((sol .- realsol) .^ 2)
        end
        return ofv
    end

    fn(x, p) = objfun(x, p[1], p[2],p[3])

    cache = DiffCache(model.iceflow.H,levels=3)
    lower_bound = [8.5e-20]    
    upper_bound = [8.0e-17] 
    optfun = OptimizationFunction(fn, Optimization.AutoForwardDiff())
    optprob = OptimizationProblem(optfun, [4.9e-17], (simulation, realsol, cache),lb = lower_bound,ub = upper_bound)
    solve(optprob, BFGS())
    


But I am experiencing a new issue: the optimizer does not optimize. For example when I setup a real solution for which i know A=4.0e-17 and an initial condition of 4.9e-17, the value after optimizing is still equal to the initial condition.

Two things i noticed:

  • printing solutions within the optimization process, shows me an alternation of float and dual numbers matrices:
Matrix{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationForwardDiffExt.var"#37#55"{OptimizationFunction{true, AutoForwardDiff{nothing, Nothing}, var"#fn#103"{var"#objfun#101"{var"#SIA2D#95", DiscreteCallback{var"#stop_condition#93", var"#action!#94"{Sleipnir.Model{SIA2Dmodel{Float64, Int64}, TImodel1{Float64}, Nothing}}, typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Optimization.ReInitCache{Vector{Float64}, Tuple{Prediction, Matrix{Float64}, DiffCache{Matrix{Float64}, Vector{Float64}}}}}, Float64}, Float64, 1}}
Matrix{Float64}
Matrix{Float64}
Matrix{ForwardDiff.Dual{ForwardDiff.Tag{var"#92#104"{DiffCache{Matrix{Float64}, Vector{Float64}}, var"#objfun#101"{var"#SIA2D#95", DiscreteCallback{var"#stop_condition#93", var"#action!#94"{Sleipnir.Model{SIA2Dmodel{Float64, Int64}, TImodel1{Float64}, Nothing}}, typeof(SciMLBase.INITIALIZE_DEFAULT), typeof(SciMLBase.FINALIZE_DEFAULT)}}, Matrix{Float64}}, Float64}, Float64, 1}}
  • ForwardDiff.gradient((x) -> objfun(x, simulation, realsol, cache), [4.9e-17]) gives 0.0 as an output.

Any clue or idea on how to solve this? Also I am wondering if the failure of derivative propagation is due to the mathematical model or due to ForwardDiff.

You might want to rescale your A units by 1e16 so that A goes from 0.4 to 0.49. In general, optimization algorithms work best if the parameters are all rescaled to have comparable magnitudes of order \sim 1.

(In an ideal world, this would not be the case, but in practice many optimization algorithms do things like steepest-descent steps or spherical trust regions, which are dependent on the scaling of the data. In addition, many implementations have heuristic “magic constants” inside them that implicitly assume magnitudes of order \sim 1. I actually just gave a lecture for our Matrix Calculus class on the relationship of steepest-descent to the choice of inner product and norm, and why for dimensionful data there is an implicit or explicit choice of rescaling in order to define steepest descent.)

Thanks a lot for the tip about rescaling units! I tried it out, and while it didn’t fix my main problem at the moment, I am certain that it will make me avoid issues later on.

Okay, I think I have narrowed the problem down and i think it lies with the solver, here is why:

  • Forward SIA2D model, when used with dual numbers propagates derivative values

  • Objective function gives zero derivative value when used with dual numbers

  • Printing and plotting the solution of the solver (ice thickness H) within the objective function, gives a zero valued array when used with Dual Numbers, for both normal and derivative values.

  • Calling upon the objective function with just float numbers provides a plot just as how I expect it to be.

Does anyone have specific tips and/or guidelines that are needed to be followed when using ODE solver with Dual Numbers?