 Optimizing performance of 2D nonlinear diffusion UDE

I have a UDE based on a 2D nonlinear ice flow diffusion PDE. It works correctly both for the forward and backwards pass, but despite using a semi-implicit solver the memory usage for backpropagation is just too big for even very short simulations, resulting in the process being killed. So far my model is implemented to parallelize multiple simulations for each training batch using pmap (not shown), but this doesn’t help, since the memory issues occur within each of the simulations of each batch.

I’m still very new to Zygote, so I was wondering: what would be the best way to optimize Zygote in that case in order to circumvent memory issues?

Here’s a rough overview of what I’m doing:

# Loss function
function loss(H, glacier_ref, UA, p, t, t₁)

H = iceflow!(H, UA, p,t,t₁)
l_H = sqrt(Flux.Losses.mse(H[H .!= 0.0], glacier_ref["H"][end][H.!= 0.0]; agg=sum))

return l_H
end

# Ice flow UDE
function iceflow!(H, UA, p,t,t₁)

# Retrieve input variables
let
current_year = 0
total_iter = 0
t_step = 0
temps = p

# Forward scheme implementation
while t < t₁
let
iter = 1
err = 2 * tolnl

Hold = copy(H)
dHdt = zeros(nx, ny)

# Get current year for MB and ELA
year = floor(Int, t) + 1

if year != current_year

# Predict value of `A`
temp = [temps[year]]'

ŶA = predict_A̅(UA, temp)

## Unpack and repack tuple with updated A value
Δx, Δy, Γ, A, B, temps, C, α = p
p = (Δx, Δy, Γ, ŶA, B, temps, C, α)
current_year = year
end

while err > tolnl && iter < itMax+1

Err = copy(H)

# Compute the Shallow Ice Approximation in a staggered grid
F, dτ = SIA(H, p)

# Compute the residual ice thickness for the inertia

dHdt_ = copy(dHdt)
@tullio dHdt[i,j] := dHdt_[i,j]*damp + ResH[i,j]

# We keep local copies for tullio
H_ = copy(H)

# Update the ice thickness
@tullio H[i,j] := max(0.0, H_[i,j] + dHdt[i,j]*dτ)

iter += 1
total_iter += 1

end

t += Δt
t_step += 1

end # let
end

return H

end   # let

end

# Shallow Ice Approximation
function SIA(H, p)
Δx, Δy, Γ, A, B, temps, C, α = p

# Update glacier surface altimetry
S = B .+ H

# All grid variables computed in a staggered grid
# Compute surface gradients on edges
dSdx  = diff(S, dims=1) / Δx
dSdy  = diff(S, dims=2) / Δy
∇S² = avg_y(dSdx).^2 .+ avg_x(dSdy).^2

Γ = 2 * A * (ρ * g)^n / (n+2) # 1 / m^3 s

D = Γ .* avg(H).^(n + 2) .* ∇S².^((n - 1)/2)

# Compute flux components
dSdx_edges = diff(S[:,2:end - 1], dims=1) / Δx
dSdy_edges = diff(S[2:end - 1,:], dims=2) / Δy
Fx = .-avg_y(D) .* dSdx_edges
Fy = .-avg_x(D) .* dSdy_edges
#  Flux divergence
F = .-(diff(Fx, dims=1) / Δx .+ diff(Fy, dims=2) / Δy) # MB to be added here

# Compute dτ for the implicit method
#dτ = dτsc * min.( 10.0 , 1.0./(1.0/Δt .+ 1.0./(cfl./(ϵ .+ avg(D)))))
D_max = 3000000
current_D_max = maximum(D)
if D_max < current_D_max
error("Increase Maximum diffusivity. Required value must be larger than \$current_D_max")
end
dτ = dτsc * min( 10.0 , 1.0/(1.0/Δt + 1.0/(cfl/(ϵ + D_max))))

return F,  dτ

end

# Compute pullback for ice flow UDE
loss_UA, back_UA = Zygote.pullback(() -> loss(H, glacier_ref, UA, p, t, t₁), θ)

What are the appropriate parameters to test this?

That’s a very inefficient time stepping method. Doing direct AD of an semi-implicit Euler method is going to be both compute and memory inefficient. Why not use something like KenCarp4 instead? Then in the sensealg you want to turn on checkpointing to reduce the memory usage. That would also not do the direct AD through the implicit solver, which of course can be handled O(1) instead of O(n) like is done here.

1 Like

@goerch I don’t have any MWE at hand. But I was just looking for general advice on strategies, so no real need to run my code.

Thanks @ChrisRackauckas for the advice. I’ve spent some time re-writing my code to fit inside the DifferentialEquations.jl solvers, and I must say it was worth it. Now my code is cleaner, and I can test a whole bunch of different solvers.

So far I have focused on optimizing the forward model. KenCarp4 doesn’t seem to work, it just doesn’t show any progress and I don’t get any errors or warnings. BS3 and Vern7 have turned out to be the best ones so far (though I should still test more I guess). The next thing I’ve tried to do is to follow your advice on how to correctly optimize the functions to be passed to solvers. What I realized, is that despite using @views, I still can see a lot of memory allocations slowing down the code. Here’s the new version of my code with DifferentialEquations.jl:

function iceflow!(dH, H, p,t)
# Retrieve parameters
#A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year
current_year = @view p[end]
A = @view p

# Get current year for MB and ELA
year = floor(Int, t) + 1
if year != current_year && year <= t₁

temp = @view p[year]
A .= A_fake(temp)

# Unpack and repack tuple with updated A value
current_year .= year
end

# Compute the Shallow Ice Approximation in a staggered grid
SIA!(dH, H, p)
end

function SIA!(dH, H, p)
# Retrieve parameters
#A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year

A = @view p
B = @view p[:,:]
S = @view p[:,:]
dSdx = @view p[:,:]
dSdy = @view p[:,:]
D = @view p[:,:]
dSdx_edges = @view p[:,:]
dSdy_edges = @view p[:,:]
∇S = @view p[:,:]
Fx = @view p[:,:]
Fy = @view p[:,:]
Vx = @view p[:,:]
Vy = @view p[:,:]

# Update glacier surface altimetry
S .= B .+ H

# All grid variables computed in a staggered grid
# Compute surface gradients on edges
dSdx .= diff(S, dims=1) ./ Δx
dSdy .= diff(S, dims=2) / Δy
∇S .= (avg_y(dSdx).^2 .+ avg_x(dSdy).^2).^((n - 1)/2)

Γ = 2 * A * (ρ * g)^n / (n+2) # 1 / m^3 s

D .= Γ .* avg(H).^(n + 2) .* ∇S

# Compute flux components
dSdx_edges .= diff(S[:,2:end - 1], dims=1) / Δx
dSdy_edges .= diff(S[2:end - 1,:], dims=2) / Δy
Fx .= .-avg_y(D) .* dSdx_edges
Fy .= .-avg_x(D) .* dSdy_edges

#  Flux divergence
inn(dH) .= .-(diff(Fx, dims=1) / Δx .+ diff(Fy, dims=2) / Δy) # MB to be added here

# Compute velocities
Vx .= -D./(avg(H) .+ ϵ).*avg_y(dSdx)
Vy .= -D./(avg(H) .+ ϵ).*avg_x(dSdy)
end

@views avg(A) = 0.25 .* ( A[1:end-1,1:end-1] .+ A[2:end,1:end-1] .+ A[1:end-1,2:end] .+ A[2:end,2:end] )
@views avg_x(A) = 0.5 .* ( A[1:end-1,:] .+ A[2:end,:] )
@views avg_y(A) = 0.5 .* ( A[:,1:end-1] .+ A[:,2:end] )

# Initialize all matrices for the solver
S, dSdx, dSdy = zeros(Float32,nx,ny),zeros(Float32,nx-1,ny),zeros(Float32,nx,ny-1)
dSdx_edges, dSdy_edges, ∇S = zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1),zeros(Float32,nx-1,ny-1)
D, dH, Fx, Fy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-2,ny-2),zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1)
V, Vx, Vy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1)

# Gather simulation parameters
current_year = 0
p = [A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year]

# Perform reference simulation with forward model
iceflow_prob = ODEProblem(iceflow!,H,(0.0,t₁),p)
iceflow_sol = solve(iceflow_prob, BS3(), progress=true, progress_steps = 1)

When I check memory allocations with @allocated or @time, I can see memory being allocated in operations like dSdx .= diff(S, dims=1) ./ Δx. However, things like S .= B .+ H have 0 memory allocated. For a single forward run with solve I can see up to some GBs being allocated. What is going on? What am I doing wrong? Once I have something optimized working in forward mode I’ll try to add AD for a UDE.

Then in the sensealg you want to turn on checkpointing to reduce the memory usage. That would also not do the direct AD through the implicit solver, which of course can be handled O(1) instead of O(n) like is done here.

What do you mean by the sensealg?

1 Like

diff(S, dims=1) allocates a new array. Instead, you could try something like:

dSdx .= @views (S[begin + 1:end, :] .- S[1:end - 1, :]) ./ Δx

Thanks for your reply. The thing is that even operations like this seem to be allocating memory:

@views avg_y(A) = 0.5 .* ( A[:,1:end-1] .+ A[:,2:end] )

Fx .= .-avg_y(D) .* dSdx_edges

So it’s not a matter of just the diff function, but something happening even with @views and indexing. The only operation that didn’t seem to allocate any memory was a pure sum of matrix @views: S .= B .+ H.

It’s still the same problem. Your avg_y function has the very same problem of diff, in that it materialises the result into a new array before returning.

One workaround would be to allow the avg_y function to accept an output array:

function avg_y!(u, A)
u .= @views 0.5 .* ( A[:,1:end-1] .+ A[:,2:end] )
u
end

and then to allocate such an output array before this function is called.

2 Likes

How many ODEs? Most likely if it’s huge it’s just stuck doing matrix inverses and you’ll want to swap out to an iterative solver like GMRES. Otherwise, you might want to try modelingtoolkitize to generate the analytical sparse Jacobian (and parallelize it), or just run sparsity detection to get the sparsity pattern. Another thing is to try CVODE_BDF(linear_solver=:GMRES) and see what happens. If this PDE generally has real eigenvalues, I’d suggest trying ROCK2() or ROCK4(). Also if it’s huge try QNDF or FBDF instead of KenCarp.

Since the solver needs to work well for the pullback, I have moved on to the implementation of the full UDE using DiffEqFlux.jl before spending more time testing different solvers and checking the memory allocation issues above.

I’m currently struggling to find a way to pass all my PDE matrices (p) to the loss function and the solver wrapped by sciml_train. All the examples I’ve come across are optimized only based on the NN parameters (θ), but my model requires the full p array with all the matrices required for the staggered grid. I’m sure it must be something quite straightforward, so could someone please indicate the best way to do this? This is what I have so far:

function loss_iceflow(p)
H = predict_iceflow(p)
H_ref = @view p[:,:]
l_H = sqrt(Flux.Losses.mse(H[H .!= 0.0], H_ref[H.!= 0.0]; agg=sum))
return l_H
end

function predict_iceflow(p)
H = @view p[:,:]
tspan = (0.0,t₁)
iceflow_prob = ODEProblem(iceflow_UDE!,H,tspan,p)
H_pred = solve(iceflow_prob, BS3(), u0=H, p=p, saveat=1.0,
progress=true, progress_steps = 1)

return H_pred[end]
end

function iceflow_UDE!(dH, H, p,t)
# Unpack parameters
#A, B, S, dSdx, dSdy, D, norm_temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year, H_ref, H, UA, θ
current_year = p
A = @view p
UA = p
θ = p

# Get current year for MB and ELA
year = floor(Int, t) + 1
if year != current_year && year <= t₁
A = predict_A̅(UA, θ, [temp]) # FastChain prediction requires explicit parameters
current_year = year
end

# Compute the Shallow Ice Approximation in a staggered grid
SIA!(dH, H, p)

end

loss(θ) = loss_iceflow(p)
iceflow_trained = DiffEqFlux.sciml_train(loss, θ, RMSProp(hyparams.η), cb=callback, maxiters = 10)

and this crashes with:

ERROR: LoadError: MethodError: no method matching (::var"#loss#10"{Vector{Any}})(::Vector{Float32})
Closest candidates are:
(::var"#loss#10")() at /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow_DiffEqs.jl:101
Stacktrace:
 macro expansion
@ ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0 [inlined]
 _pullback(ctx::Zygote.Context, f::var"#loss#10"{Vector{Any}}, args::Vector{Float32})
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:9
 _pullback
@ ~/.julia/packages/DiffEqFlux/jpIWG/src/train.jl:84 [inlined]
 _pullback(::Zygote.Context, ::WARNING: both Flux and Iterators export "flatten"; uses of it in module DiffEqFlux must be qualified
WARNING: both Flux and Distributions export "params"; uses of it in module DiffEqFlux must be qualified
DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, ::Vector{Float32}, ::Nothing)
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
 _apply(::Function, ::Vararg{Any, N} where N)
@ Core ./boot.jl:804
@ ~/.julia/packages/Zygote/AlLTp/src/lib/lib.jl:200 [inlined]
 _pullback
 _pullback
@ ~/.julia/packages/SciMLBase/7GnZA/src/problems/basic_problems.jl:107 [inlined]
 _pullback(::Zygote.Context, ::OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, ::Vector{Float32}, ::Nothing)
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
 _apply(::Function, ::Vararg{Any, N} where N)
@ Core ./boot.jl:804
@ ~/.julia/packages/Zygote/AlLTp/src/lib/lib.jl:200 [inlined]
 _pullback
 _pullback
@ ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:6 [inlined]
 _pullback(ctx::Zygote.Context, f::GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}, args::Vector{Float32})
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
 _apply(::Function, ::Vararg{Any, N} where N)
@ Core ./boot.jl:804
@ ~/.julia/packages/Zygote/AlLTp/src/lib/lib.jl:200 [inlined]
 _pullback
 _pullback
@ ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:8 [inlined]
 _pullback(ctx::Zygote.Context, f::GalacticOptim.var"#263#273"{Tuple{}, GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}}, args::Vector{Float32})
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
 _pullback(f::Function, args::Vector{Float32})
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:34
 pullback(f::Function, args::Vector{Float32})
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:40
@ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:75
 (::GalacticOptim.var"#261#271"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}})(::Vector{Float32}, ::Vector{Float32})
@ GalacticOptim ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:8
 macro expansion
@ ~/.julia/packages/GalacticOptim/DHxE0/src/solve/flux.jl:41 [inlined]
 macro expansion
@ ~/.julia/packages/GalacticOptim/DHxE0/src/utils.jl:35 [inlined]
 __solve(prob::OptimizationProblem{false, OptimizationFunction{false, GalacticOptim.AutoZygote, OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, GalacticOptim.var"#261#271"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}}, GalacticOptim.var"#264#274"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{var"#loss#10"{Vector{Any}}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}}, GalacticOptim.var"#269#279", Nothing, Nothing, Nothing}, Vector{Float32}, SciMLBase.NullParameters, Nothing, Nothing, Nothing, Nothing, Base.Iterators.Pairs{Symbol, typeof(callback), Tuple{Symbol}, NamedTuple{(:cb,), Tuple{typeof(callback)}}}}, opt::RMSProp, data::Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}; maxiters::Int64, cb::Function, progress::Bool, save_best::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ GalacticOptim ~/.julia/packages/GalacticOptim/DHxE0/src/solve/flux.jl:39
 #solve#476
@ ~/.julia/packages/SciMLBase/7GnZA/src/solve.jl:3 [inlined]
 sciml_train(::var"#loss#10"{Vector{Any}}, ::Vector{Float32}, ::RMSProp, ::Nothing; lower_bounds::Nothing, upper_bounds::Nothing, maxiters::Int64, kwargs::Base.Iterators.Pairs{Symbol, typeof(callback), Tuple{Symbol}, NamedTuple{(:cb,), Tuple{typeof(callback)}}})
@ DiffEqFlux ~/.julia/packages/DiffEqFlux/jpIWG/src/train.jl:89
 train_iceflow_UDE(H₀::Matrix{Float32}, UA::Function, H_refs::Vector{Matrix{Float32}}, temp_series::Vector{Any}, hyparams::Hyperparameters, idx::Int64)
@ Main ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow_DiffEqs.jl:103
 #8
@ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow_DiffEqs.jl:72 [inlined]
 iterate
@ ./generator.jl:47 [inlined]
 _collect(c::Vector{Int64}, itr::Base.Generator{Vector{Int64}, var"#8#9"{Matrix{Float32}, FastChain{Tuple{FastDense{var"#12#20", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#13#21", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#14#22", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#sigmoid_A#17"{Int64, Float64}, DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}}}, Vector{Matrix{Float32}}, Vector{Any}, Hyperparameters}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base ./array.jl:691
 collect_similar(cont::Vector{Int64}, itr::Base.Generator{Vector{Int64}, var"#8#9"{Matrix{Float32}, FastChain{Tuple{FastDense{var"#12#20", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#13#21", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#14#22", DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}, FastDense{var"#sigmoid_A#17"{Int64, Float64}, DiffEqFlux.var"#initial_params#90"{Vector{Float32}}}}}, Vector{Matrix{Float32}}, Vector{Any}, Hyperparameters}})
@ Base ./array.jl:606
 map(f::Function, A::Vector{Int64})
@ Base ./abstractarray.jl:2294
 train_batch_iceflow_UDE(H₀::Matrix{Float32}, UA::Function, H_refs::Vector{Matrix{Float32}}, temp_series::Vector{Any}, hyparams::Hyperparameters, idxs::Vector{Int64})
@ Main ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow_DiffEqs.jl:72
 macro expansion
@ ./timing.jl:210 [inlined]
 top-level scope
@ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/ice_dynamics_DiffEqs.jl:218
in expression starting at /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/ice_dynamics_DiffEqs.jl:190

For my previous manual implementation with Zygote.jl (which I’m trying to implement with DiffEqFlux.jl), I just did:

loss_UA, back_UA = Zygote.pullback(() -> loss(H, H_ref, UA, p, t, t₁), θ)

My guess looking at the error is that sciml_train is passing θ to loss by default, which is not the required input, since I need p instead. What is the right way to pass more parameters aside from θ to the loss function? Once I get my head around this I’ll try to make a small PR in order to clarify this in the docs. Thanks in advance!

what’s the full current code to copy/paste and test?

Hi Chris, I believe I have managed to work around the issue of passing all the matrices using closures (see MWE below). Now the forward run works correctly, but I’m getting the classic “Mutating arrays is not allowed” from Zygote. Here is a MWE to reproduce the issue:

using Statistics
using LinearAlgebra
using Random
using OrdinaryDiffEq
using DiffEqFlux
using Flux

const t₁ = 10                 # number of simulation years
const ρ = 900                     # Ice density [kg / m^3]
const g = 9.81                    # Gravitational acceleration [m / s^2]
const n = 3                       # Glen's flow law exponent
const maxA = 8e-16
const minA = 3e-17
const maxT = 1
const minT = -25
A = 1.3e-24 #2e-16  1 / Pa^3 s
A *= 60 * 60 * 24 * 365.25 # [1 / Pa^3 yr]
C = 0
α = 0

@views avg(A) = 0.25 .* ( A[1:end-1,1:end-1] .+ A[2:end,1:end-1] .+ A[1:end-1,2:end] .+ A[2:end,2:end] )
@views avg_x(A) = 0.5 .* ( A[1:end-1,:] .+ A[2:end,:] )
@views avg_y(A) = 0.5 .* ( A[:,1:end-1] .+ A[:,2:end] )
@views diff_x(A) = (A[begin + 1:end, :] .- A[1:end - 1, :])
@views diff_y(A) = (A[:, begin + 1:end] .- A[:, 1:end - 1])
@views inn(A) = A[2:end-1,2:end-1]

function ref_glacier(temps, H₀)

H = deepcopy(H₀)

# Initialize all matrices for the solver
S, dSdx, dSdy = zeros(Float32,nx,ny),zeros(Float32,nx-1,ny),zeros(Float32,nx,ny-1)
dSdx_edges, dSdy_edges, ∇S = zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1),zeros(Float32,nx-1,ny-1)
D, dH, Fx, Fy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-2,ny-2),zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1)
V, Vx, Vy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1)

# Gather simulation parameters
current_year = 0
p = [A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year]

# Perform reference simulation with forward model
println("Running forward PDE ice flow model...\n")
iceflow_prob = ODEProblem(iceflow!,H,(0.0,t₁),p)
iceflow_sol = solve(iceflow_prob, BS3(), progress=true, saveat=1.0, progress_steps = 1)

return iceflow_sol[end]
end

function iceflow!(dH, H, p,t)
# Unpack parameters
#A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year
current_year = @view p
A = @view p

# Get current year for MB and ELA
year = floor(Int, t) + 1
if year != current_year && year <= t₁
temp = @view p[year]
A .= A_fake(temp)
current_year .= year
end

# Compute the Shallow Ice Approximation in a staggered grid
SIA!(dH, H, p)
end

function train_iceflow_UDE(H₀, UA, H_ref, temps)

# Gather simulation parameters
H = deepcopy(H₀)

# Initialize all matrices for the solver
S, dSdx, dSdy = zeros(Float32,nx,ny),zeros(Float32,nx-1,ny),zeros(Float32,nx,ny-1)
dSdx_edges, dSdy_edges, ∇S = zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1),zeros(Float32,nx-1,ny-1)
D, dH, Fx, Fy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-2,ny-2),zeros(Float32,nx-1,ny-2),zeros(Float32,nx-2,ny-1)
V, Vx, Vy = zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1),zeros(Float32,nx-1,ny-1)

# Gather simulation parameters
current_year = 0
θ = initial_params(UA)
context = [A, B, S, dSdx, dSdy, D, temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year, H_ref, H, UA, θ]
loss(θ) = loss_iceflow(UA, θ, H, context) # closure

iceflow_trained = DiffEqFlux.sciml_train(loss, θ, RMSProp(0.01), maxiters = 100)

return iceflow_trained
end

function loss_iceflow(UA, θ, H, context)

H = predict_iceflow(UA, θ, H, context)

H_ref = @view context[:,:]
l_H = sqrt(Flux.Losses.mse(H[H .!= 0.0], H_ref[H.!= 0.0]; agg=sum))

return l_H
end

function predict_iceflow(UA, θ, H, context)

iceflow_UDE!(dH, H, θ, t) = iceflow_NN!(dH, H, θ, t, context) # closure
tspan = (0.0,t₁)
iceflow_prob = ODEProblem(iceflow_UDE!,H,tspan,θ)
H_pred = solve(iceflow_prob, BS3(), u0=H, p=θ, saveat=1.0,
progress=true, progress_steps = 1)

return H_pred[end]
end

function iceflow_NN!(dH, H, θ, t, context)

# Unpack parameters
#A, B, S, dSdx, dSdy, D, norm_temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year, H_ref, H, UA, θ
current_year = context
A = @view context
UA = context
dH = @view context[:,:]

# Get current year for MB and ELA
year = floor(Int, t) + 1
if year != current_year && year <= t₁
temp = context[year]
A .= predict_A̅(UA, θ, [temp]) # FastChain prediction requires explicit parameters
current_year = year
end

# Compute the Shallow Ice Approximation in a staggered grid
SIA!(dH, H, context)

end

"""
SIA(H, p)

Compute a step of the Shallow Ice Approximation PDE in a forward model
"""

function SIA!(dH, H, context)

# Retrieve parameters
#A, B, S, dSdx, dSdy, D, norm_temps, dSdx_edges, dSdy_edges, ∇S, Fx, Fy, Vx, Vy, V, C, α, current_year, H_ref, H, UA, θ
A = Ref{Float32}(context)
B = @view context[:,:]
S = @view context[:,:]
dSdx = @view context[:,:]
dSdy = @view context[:,:]
D = @view context[:,:]
dSdx_edges = @view context[:,:]
dSdy_edges = @view context[:,:]
∇S = @view context[:,:]
Fx = @view context[:,:]
Fy = @view context[:,:]
Vx = @view context[:,:]
Vy = @view context[:,:]

# Update glacier surface altimetry
S .= B .+ H

# All grid variables computed in a staggered grid
# Compute surface gradients on edges
dSdx .= diff_x(S) / Δx
dSdy .= diff_y(S) / Δy
∇S .= (avg_y(dSdx).^2 .+ avg_x(dSdy).^2).^((n - 1)/2)

Γ = 2 * A[] * (ρ * g)^n / (n+2) # 1 / m^3 s
D .= Γ .* avg(H).^(n + 2) .* ∇S

# Compute flux components
dSdx_edges .= diff(S[:,2:end - 1], dims=1) / Δx
dSdy_edges .= diff(S[2:end - 1,:], dims=2) / Δy
Fx .= .-avg_y(D) .* dSdx_edges
Fy .= .-avg_x(D) .* dSdy_edges

#  Flux divergence
inn(dH) .= .-(diff(Fx, dims=1) / Δx .+ diff(Fy, dims=2) / Δy) # MB to be added here

end

function A_fake(temp)
return @. minA + (maxA - minA) * ((temp-minT)/(maxT-minT) )^2
end

predict_A̅(UA, θ, temp) = UA(temp, θ) .* 1e-16

#### Generate reference dataset ####
nx = ny = 100
B = zeros(Float64, (nx, ny))
σ = 1000
H₀ = [ 250 * exp( - ( (i - nx/2)^2 + (j - ny/2)^2 ) / σ ) for i in 1:nx, j in 1:ny ]
Δx = Δy = 50 #m

temps = [0.0, -0.5, -0.2, -0.1, -0.3, -0.1, -0.2, -0.3, -0.4, -0.1]
H_ref = ref_glacier(temps, H₀)

# Train UDE
minA_out = 0.3
maxA_out = 8
sigmoid_A(x) = minA_out + (maxA_out - minA_out) / ( 1 + exp(-x) )
UA = FastChain(
FastDense(1,3, x->tanh.(x)),
FastDense(3,10, x->tanh.(x)),
FastDense(10,3, x->tanh.(x)),
FastDense(3,1, sigmoid_A)
)

iceflow_trained = train_iceflow_UDE(H₀, UA, H_ref, temps)

In my manual implementation of this with Zygote I managed to circumvent this by copying the matrices at each step and by using intermediate buffers in the operations that required updating a given matrix. That is of course very inefficient. Do I still need to do the same with DiffEqFlux.jl or is there any way to pass everything by reference using @views as I implemented in the MWE above? Thanks again for your help!

What does EnzymeVJP() spit out? Zygote won’t work here, but in theory Enzyme can.

With sensealg = BacksolveAdjoint(autojacvec=EnzymeVJP()) it gives a very lengthy low-level error. The only meaningful parts I could see were at the very end with:

could not deduce type of integer   %.repack31 = getelementptr inbounds { {} addrspace(10)*, [1 x i64], i64, i64 }, { {} addrspace(10)*, [1 x i64], i64, i64 } addrspace(10)* %52, i64 0, i32 2, !dbg !104 num:8 q:{[]:Pointer}
warning: /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/examples/MWE_iceflow.jl:128:0: failed to deduce type of value   %.repack31 = getelementptr inbounds { {} addrspace(10)*, [1 x i64], i64, i64 }, { {} addrspace(10)*, [1 x i64], i64, i64 } addrspace(10)* %52, i64 0, i32 2, !dbg !104
Assertion failed: (0 && "could not deduce type of integer"), function firstPointer, file /workspace/srcdir/Enzyme/enzyme/Enzyme/TypeAnalysis/TypeAnalysis.cpp, line 4724.

signal (6): Abort trap: 6
in expression starting at /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/examples/MWE_iceflow.jl:212
Allocations: 356126791 (Pool: 356060610; Big: 66181); GC: 170
zsh: abort      julia

Looks like that’s from a type-instability. If you check the stability of your f function do you get inference issues?

Hmm, apparently yes:

return type SubArray{Float32, 2, Matrix{Float32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false} does not match inferred return type Any

I don’t quite know how to interpret that, since the iceflow! and SIA! functions don’t have a return statement, everything is passed by reference with @views.

What if you return nothing at the end of them?

I get a very similar error:

could not deduce type of integer   %.repack31 = getelementptr inbounds { {} addrspace(10)*, [1 x i64], i64, i64 }, { {} addrspace(10)*, [1 x i64], i64, i64 } addrspace(10)* %52, i64 0, i32 2, !dbg !105 num:8 q:{[]:Pointer}
warning: /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/examples/MWE_iceflow.jl:129:0: failed to deduce type of value   %.repack31 = getelementptr inbounds { {} addrspace(10)*, [1 x i64], i64, i64 }, { {} addrspace(10)*, [1 x i64], i64, i64 } addrspace(10)* %52, i64 0, i32 2, !dbg !105
Assertion failed: (0 && "could not deduce type of integer"), function firstPointer, file /workspace/srcdir/Enzyme/enzyme/Enzyme/TypeAnalysis/TypeAnalysis.cpp, line 4724.

signal (6): Abort trap: 6
in expression starting at /Users/Bolib001/Desktop/Jordi/Julia/odinn_toy_model/scripts/examples/MWE_iceflow.jl:217