NonlinearSolve stalls when Jacobian is specified

I’m currently trying to solve a non-linear system, but I get a stalled retcode when I specify a jacobian.

Specifically, I’d trying to solve a discretized PDE using a home-made backward Euler stepper. The spatial domain is 1D and I have 4 variables. It would be a DAE, so 2 of the 4 variables have “previous” time steps required. I call the four equations A,B,C,D, and there are n of each. The code below is just to solve it at one time step. Ultimately, I’m sure it would be better to run this with DifferentialEquations, but I was running into some issues and wanted to first compare with some existing matlab code.

using LinearAlgebra, SparseArrays, BenchmarkTools
using Optimization, NonlinearSolve, SparseConnectivityTracer
using CairoMakie, ColorSchemes

### Define parameters:
ν_ = 0.01
δ_ = 1e-3
κ0m = ν_^2*1.25
κ0h = ν_^2
s_ = 1.
A_ = 2.
tail_ = false
Tdef_ = -1.
dt = 0.0001
eps = 1e-4
u∞ = 0.
θb = -1.

ncell_ = 100 # number of cells
nedge_ = ncell_
zmax = 0.5

### grid setup
dz = zmax/(ncell_-0.5)
zcell_ = dz*(0.5.+collect(0:(ncell_-1)))
dzcell_ = diff(zcell_)
zedge_ = [0.; (zcell_[1:end-1]+zcell_[2:end])./2.]
dzedge_ = diff(zedge_)

### Operators
∇ = sparse(diagm(0 => [0;ones(ncell_-1)./dzcell_], -1=>-ones(ncell_-1)./dzcell_));
div = spdiagm(0 => [-ones(nedge_-1)./dzedge_;0], 1=>ones(nedge_-1)./dzedge_);

### Initial conditions:
uprev = zeros(ncell_) ### Very simple ICs
θprev = [-1; zeros(ncell_-1)] ### Very simple ICs

u_z = ∇*uprev
θ_z = ∇*θprev
Ri = θ_z ./ (u_z.^2 .+ eps.^2)
f = max.(0.,1. .- Ri./ν_).^2

qprev = - ((zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h).*θ_z/ν_
τprev = - ((zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m).*u_z

y_in = [uprev;θprev;τprev;qprev]


p = ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev


function FF(dy,y_in,p)
    ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
    dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev = p
#     ncell_, nedge_ = Int(ncell_), Int(nedge_)

    u = y_in[1:ncell_]
    θ = y_in[ncell_.+(1:ncell_)]
    τ = y_in[2*ncell_.+(1:ncell_)]
    q = y_in[3*ncell_.+(1:ncell_)]
    
    ### Eq1
    A = (u.-uprev).*ν_./dt .- div*τ .+ θ.*s_
    A[ncell_] = u[ncell_] - u∞

    ### Eq2
    B = (θ.-θprev).*ν_./dt .+ div*q
    B[ncell_] = θ[ncell_]
    
    ### Intermediate vars
    u_z = ∇*u
    θ_z = ∇*θ
    Ri = θ_z ./ (u_z.^2 .+ eps.^2)
    ### Max not smoothed
#     f = max.(0.,1. .-Ri./ν_).^2 
    ### Max smoothed
    ε_smooth = 1e-8
    x = 1 .- Ri./ν_
    g = (x .+ sqrt.(x.^2 .+ ε_smooth)) ./ 2.
    f = g.^2
    
    Km = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m
    Kh = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h
    
    
    ### Eq3
    C = τ .- Km.*u_z
    C[1] = u[1]
    
    ### Eq4
    D = ν_.*q .+ Kh.*θ_z
    D[1] = θ[1] - θb
    
    
    dy .= [A;B;C;D]
#     return [A;B;C;D]
end


function FF_jac(J_out,y_in,p)
    ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
    dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev = p
#     ncell_, nedge_ = Int(ncell_), Int(nedge_)

    u = y_in[1:ncell_]
    θ = y_in[ncell_.+(1:ncell_)]
    τ = y_in[2*ncell_.+(1:ncell_)]
    q = y_in[3*ncell_.+(1:ncell_)]
    
    
    ### Eq1
#     A = (u.-uprev).*ν_./dt .- div*τ .+ θ.*s_
#     A[ncell_] = u[ncell_] - u∞
    dA_du = spdiagm(0=>ones(ncell_) * ν_ / dt)
    dA_du[ncell_,:] .= 0. ### Holdover, probably not needed
    dA_du[ncell_,ncell_] = 1.
    dA_dθ = spdiagm(0=>ones(ncell_) * s_)
    dA_dθ[ncell_,:] .= 0. ### Can replace with line below
#     dA_dθ[ncell_,ncell_] = 0.
    dA_dτ = -div
    dA_dτ[ncell_,:] .= 0.
    dA_dq = spzeros(ncell_,ncell_)

    
    ### Eq2
#     B = (θ.-θprev).*ν_./dt .+ div*q
#     B[ncell_] = θ[ncell_]
    dB_du = spzeros(ncell_,ncell_)
    dB_dθ = spdiagm(0=>ones(ncell_) / dt)
    dB_dθ[ncell_,:] .= 0. ### Can replace with line below
    dB_dθ[ncell_, ncell_] = 1.
    dB_dτ = spzeros(ncell_,ncell_)
    dB_dq = div
    dB_dq[ncell_,:] .= 0.
    
    
    ### Intermediate variables
    u_z = ∇*u
    θ_z = ∇*θ
    Ri = θ_z ./ (u_z.^2 .+ eps.^2)
    dRi_dθ = spdiagm(0=>1. ./(u_z.^2 .+ eps.^2)) * ∇
#     dRi_du = spdiagm(0=>-2. * θ_z ./(u_z.^2 .+ eps.^2)) * ∇ ### old
    dRi_du = spdiagm(0 => -2. * u_z .* θ_z ./ ((u_z.^2 .+ eps.^2).^2)) * ∇
    
    ### Max not smoothed
#     f = max.(0.,1. .-Ri./ν_).^2 
#     df_dRi = spdiagm(0 => (-2/ν_) .* max.(0, 1 .- Ri./ν_))
    ### Max smoothed
    ε_smooth = 1e-8
    x = 1 .- Ri./ν_
    g = (x .+ sqrt.(x.^2 .+ ε_smooth)) ./ 2.
    f = g.^2
    dg_dRi = - (1 ./(2*ν_)) .* (1 .+ x ./ sqrt.(x.^2 .+ ε_smooth))
    df_dRi = spdiagm(0 => 2 .* g .* dg_dRi)
    
    
    df_du = df_dRi * dRi_du
    df_dθ = df_dRi * dRi_dθ
    
    Km = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m
    dKm_du = spdiagm(0=>(zedge_.+δ_).^2 .* sign.(u_z) .* f)*∇ + spdiagm(0=>(zedge_.+δ_).^2 .* abs.(u_z))*df_du
    dKm_dθ = spdiagm(0=>(zedge_.+δ_).^2 .* abs.(u_z))*df_dθ
    Kh = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h
    dKh_du = dKm_du
    dKh_dθ = dKm_dθ
    
    
    ### Eq3
#     C = τ .- Km.*u_z
#     C[1] = u[1]
    dC_du = spdiagm(0=>-Km) * ∇ + spdiagm(0=>-u_z) * dKm_du
    dC_du[1,:] .= 0. ### Don't think this is needed
    dC_du[1,1] = 1.
    dC_dθ = spdiagm(0=>-u_z) * dKm_dθ
    dC_dθ[1,:] .= 0.
    dC_dτ = spdiagm(0=>ones(nedge_))
    dC_dτ[1,:] .= 0. 
    dC_dq = spzeros(nedge_,nedge_)
    
    
    
    ### Eq4
#     D = ν_.*q .+ Kh.*θ_z
#     D[1] = θ[1] - θb
    dD_du = spdiagm(0=>θ_z) * dKh_du
    dD_du[1,:] .= 0.
    dD_dθ = spdiagm(0=>Kh) * ∇ + spdiagm(0=>θ_z) * dKh_dθ
    dD_dθ[1,:] .= 0. ### Don't think this is needed
    dD_dθ[1,1] = 1.
    dD_dτ = spzeros(nedge_,nedge_)
    dD_dq = spdiagm(0=>ν_*ones(nedge_))
    dD_dq[1,:] .= 0.


    J_out .= [dA_du  dA_dθ  dA_dτ  dA_dq;
           dB_du  dB_dθ  dB_dτ  dB_dq;
           dC_du  dC_dθ  dC_dτ  dC_dq;
           dD_du  dD_dθ  dD_dτ  dD_dq]
end

I know the code isn’t optimized to reduce allocations (especially the jacobian calculation – I haven’t preallocated any space in memory) – I just want something that will run for now. The problem I’m encountering is: when I run this just using a jacobian sparsity pattern (but not the jacobian), it will run, but gets really slow for future time steps as the system generates sharp gradients.

prob_sparsity_pattern = NonlinearProblem(NonlinearFunction(FF; sparsity = TracerSparsityDetector()), y_in, p; 
                        abstol = 1e-10, reltol = 1e-10)
sol_sparsity_pattern = solve(prob_sparsity_pattern, NewtonRaphson(),maxiters=Int(1e4))

Since the jacobian is pretty easily specified analytically, I figured this would help:

prob_jac = NonlinearProblem(NonlinearFunction(FF; jac = FF_jac, sparsity = TracerSparsityDetector()), y_in, p; 
                        abstol = 1e-10, reltol = 1e-10)
sol_jac = solve(prob_jac, NewtonRaphson(),maxiters=Int(1e4))

I now get a “stalled” retcode – even though the solutions are very close. That is, maximum(abs.(sol_sparsity_pattern.u - sol_jac.u)) returns a number on the order of 1e-9. Changing the tolerances of prob_jac never seems to fix this issue.

My first thought was that a max(0,x) function might be causing problems. I’ve commented this out and replaced it with a x+sqrt(x^2+epsilon) (along with corresponding derivatives in the jacobian), but the results are unchanged. I just can’t get the version with the Jacobian give me a solved retcode.

Any help would be appreciated!

What is the condition number of the Jacobian?

Thanks, Chris – An embarrassing realization is that I had a typo in the jacobian (despite combing over it many times) causing the condition number to be very large (1e6).

Fixing that helped when ncell_ is low. However, the condition number grows as I increase the system size, and becomes increasingly slow to solve with the Jacobian. Once ncell_ = 5000, the condition number is again on the order of 1e6 and solving with just the sparsity detection is 100x faster, even though the allocations are higher (benchmark below).

I’ve gone through the the tutorial on solving large ill-conditioned systems, but I’m encountering issues with KLUFactorization (which has worked dozens of times for me before) and setting up preconditioners. This tutorial doesn’t define the sparse jacobian analytically, so it’s possible I’m doing something wrong.

using LinearAlgebra, SparseArrays, BenchmarkTools
using Optimization, NonlinearSolve, SparseConnectivityTracer
using CairoMakie, ColorSchemes

### Define parameters:
ν_ = 0.01
δ_ = 1e-3
κ0m = ν_^2*1.25
κ0h = ν_^2
s_ = 1.
A_ = 2.
tail_ = false
Tdef_ = -1.
dt = 0.0001
eps = 1e-4
u∞ = 0.
θb = -1.

ncell_ = 5000 # number of cells
nedge_ = ncell_
zmax = 0.5

### grid setup
dz = zmax/(ncell_-0.5)
zcell_ = dz*(0.5.+collect(0:(ncell_-1)))
dzcell_ = diff(zcell_)
zedge_ = [0.; (zcell_[1:end-1]+zcell_[2:end])./2.]
dzedge_ = diff(zedge_)

### Operators
∇ = sparse(diagm(0 => [0;ones(ncell_-1)./dzcell_], -1=>-ones(ncell_-1)./dzcell_));
div = spdiagm(0 => [-ones(nedge_-1)./dzedge_;0], 1=>ones(nedge_-1)./dzedge_);

### Initial conditions:
uprev = zeros(ncell_) ### Very simple ICs
θprev = [-1; zeros(ncell_-1)] ### Very simple ICs

u_z = ∇*uprev
θ_z = ∇*θprev
Ri = θ_z ./ (u_z.^2 .+ eps.^2)
f = max.(0.,1. .- Ri./ν_).^2

qprev = - ((zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h).*θ_z/ν_
τprev = - ((zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m).*u_z

y_in = [uprev;θprev;τprev;qprev]

p = ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev


function FF(dy,y_in,p)
    ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
    dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev = p
#     ncell_, nedge_ = Int(ncell_), Int(nedge_)

    u = y_in[1:ncell_]
    θ = y_in[ncell_.+(1:ncell_)]
    τ = y_in[2*ncell_.+(1:ncell_)]
    q = y_in[3*ncell_.+(1:ncell_)]
    
    ### Eq1
    A = (u.-uprev).*ν_./dt .- div*τ .+ θ.*s_
    A[ncell_] = u[ncell_] - u∞

    ### Eq2
    B = (θ.-θprev)./dt .+ div*q
    B[ncell_] = θ[ncell_]
    
    ### Intermediate vars
    u_z = ∇*u
    θ_z = ∇*θ
    Ri = θ_z ./ (u_z.^2 .+ eps.^2)
    ### Max not smoothed
#     f = max.(0.,1. .-Ri./ν_).^2 
    ### Max smoothed
    ε_smooth = 1e-8
    x = 1 .- Ri./ν_
    g = (x .+ sqrt.(x.^2 .+ ε_smooth)) ./ 2.
    f = g.^2
    
    Km = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m
    Kh = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h
    
    
    ### Eq3
    C = τ .- Km.*u_z
    C[1] = u[1]
    
    ### Eq4
    D = ν_.*q .+ Kh.*θ_z
    D[1] = θ[1] - θb
    
    
    dy .= [A;B;C;D]
#     return [A;B;C;D]
end


function FF_jac(J_out,y_in,p)
    ν_, δ_, κ0m, κ0h, s_, θb, u∞, eps, dt, ncell_, nedge_, 
    dz, zcell_, dzcell_, zedge_, dzedge_, ∇, div,uprev,θprev = p
#     ncell_, nedge_ = Int(ncell_), Int(nedge_)

    u = y_in[1:ncell_]
    θ = y_in[ncell_.+(1:ncell_)]
    τ = y_in[2*ncell_.+(1:ncell_)]
    q = y_in[3*ncell_.+(1:ncell_)]
    
    
    ### Eq1
#     A = (u.-uprev).*ν_./dt .- div*τ .+ θ.*s_
#     A[ncell_] = u[ncell_] - u∞
    dA_du = spdiagm(0=>ones(ncell_) * ν_ / dt)
    dA_du[ncell_,:] .= 0. ### Holdover, probably not needed
    dA_du[ncell_,ncell_] = 1.
    dA_dθ = spdiagm(0=>ones(ncell_) * s_)
    dA_dθ[ncell_,:] .= 0. ### Can replace with line below
#     dA_dθ[ncell_,ncell_] = 0.
    dA_dτ = -div
    dA_dτ[ncell_,:] .= 0.
    dA_dq = spzeros(ncell_,ncell_)

    
    ### Eq2
#     B = (θ.-θprev)./dt .+ div*q
#     B[ncell_] = θ[ncell_]
    dB_du = spzeros(ncell_,ncell_)
    dB_dθ = spdiagm(0=>ones(ncell_) / dt)
    dB_dθ[ncell_,:] .= 0. ### Can replace with line below
    dB_dθ[ncell_, ncell_] = 1.
    dB_dτ = spzeros(ncell_,ncell_)
    dB_dq = div
    dB_dq[ncell_,:] .= 0.
    
    
    ### Intermediate variables
    u_z = ∇*u
    θ_z = ∇*θ
    Ri = θ_z ./ (u_z.^2 .+ eps.^2)
    dRi_dθ = spdiagm(0=>1. ./(u_z.^2 .+ eps.^2)) * ∇
#     dRi_du = spdiagm(0=>-2. * θ_z ./(u_z.^2 .+ eps.^2)) * ∇ ### old
    dRi_du = spdiagm(0 => -2. * u_z .* θ_z ./ ((u_z.^2 .+ eps.^2).^2)) * ∇
    
    ### Max not smoothed
#     f = max.(0.,1. .-Ri./ν_).^2 
#     df_dRi = spdiagm(0 => (-2/ν_) .* max.(0, 1 .- Ri./ν_))
    ### Max smoothed
    ε_smooth = 1e-8
    x = 1 .- Ri./ν_
    g = (x .+ sqrt.(x.^2 .+ ε_smooth)) ./ 2.
    f = g.^2
    dg_dRi = - (1 ./(2*ν_)) .* (1 .+ x ./ sqrt.(x.^2 .+ ε_smooth))
    df_dRi = spdiagm(0 => 2 .* g .* dg_dRi)
    
    
    df_du = df_dRi * dRi_du
    df_dθ = df_dRi * dRi_dθ
    
    Km = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0m
    dKm_du = spdiagm(0=>(zedge_.+δ_).^2 .* sign.(u_z) .* f)*∇ + spdiagm(0=>(zedge_.+δ_).^2 .* abs.(u_z))*df_du
    dKm_dθ = spdiagm(0=>(zedge_.+δ_).^2 .* abs.(u_z)) * df_dθ
    Kh = (zedge_.+δ_).^2 .* abs.(u_z) .* f .+ κ0h
    dKh_du = dKm_du
    dKh_dθ = dKm_dθ
    
    
    ### Eq3
#     C = τ .- Km.*u_z
#     C[1] = u[1]
    dC_du = spdiagm(0=>-Km) * ∇ + spdiagm(0=>-u_z) * dKm_du
    dC_du[1,:] .= 0. ### Don't think this is needed
    dC_du[1,1] = 1.
    dC_dθ = spdiagm(0=>-u_z) * dKm_dθ
    dC_dθ[1,:] .= 0.
    dC_dτ = spdiagm(0=>ones(nedge_))
    dC_dτ[1,:] .= 0. 
    dC_dq = spzeros(nedge_,nedge_)
    
    
    
    ### Eq4
#     D = ν_.*q .+ Kh.*θ_z
#     D[1] = θ[1] - θb
    dD_du = spdiagm(0=>θ_z) * dKh_du
    dD_du[1,:] .= 0.
    dD_dθ = spdiagm(0=>Kh) * ∇ + spdiagm(0=>θ_z) * dKh_dθ
    dD_dθ[1,:] .= 0. ### Don't think this is needed
    dD_dθ[1,1] = 1.
    dD_dτ = spzeros(nedge_,nedge_)
    dD_dq = spdiagm(0=>ν_*ones(nedge_))
    dD_dq[1,:] .= 0.


    J_out .= [dA_du  dA_dθ  dA_dτ  dA_dq;
           dB_du  dB_dθ  dB_dτ  dB_dq;
           dC_du  dC_dθ  dC_dτ  dC_dq;
           dD_du  dD_dθ  dD_dτ  dD_dq]
end

prob_sparsity_pattern = NonlinearProblem(NonlinearFunction(FF; sparsity = TracerSparsityDetector()), y_in, p; 
                        abstol = 1e-10, reltol = 1e-10)
sol_sparsity_pattern = solve(prob_sparsity_pattern, NewtonRaphson(),maxiters=Int(1e4))

prob_jac = NonlinearProblem(NonlinearFunction(FF; jac = FF_jac), y_in, p; 
                        abstol = 1e-10, reltol = 1e-10)
sol_jac = solve(prob_jac, NewtonRaphson(concrete_jac = true),maxiters=Int(1e4))

Timing these two solves,

@btime solve($prob_sparsity_pattern, NewtonRaphson(),maxiters=Int(1e4))
> 351.875 ms (431071 allocations: 3.14 GiB)

@btime solve($prob_jac, NewtonRaphson(concrete_jac = true),maxiters=Int(1e4))
> 54.501 s (1442 allocations: 5.98 GiB)

I know my code isn’t non-allocating at this stage, but I’m surprised to see so few allocations solving with the Jacobian.

I also tried KLUFactorization, but get an error. This surprises me as I’m pretty certain everything in my Jacobian is sparse (and it works fine on prob_sparsity_pattern, so it’s definitely related to the Jacobian).

solve(prob_jac, NewtonRaphson(linsolve = KLUFactorization(); concrete_jac = true),maxiters=Int(1e4))
> MethodError: no method matching nonzeros(::Matrix{Float64})
The function `nonzeros` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  nonzeros(::SparseArrays.FixedSparseVector)
   @ SparseArrays /Applications/Julia-1.11.app/Contents/Resources/julia/share/julia/stdlib/v1.11/SparseArrays/src/sparsevector.jl:134

When I try running with AlgebraicMultigrid from the tutorial, I get a similar MethodError,

MethodError: no method matching ruge_stuben(::Matrix{Float64})

so I think there’s an issue with something in my Jacobian being non-sparse… There are a few intermediate vectors I make in the Jacobian that are non-sparse (e.g. u_z, Ri, etc.), but these will generally have non-zero elements everywhere.

(edited to add the line to define prob_jac)

Are all of the elements the same between the two ways to calculate the Jacobian on a random point?

To make sure I understand you correctly – you mean comparing my analytic Jacobian to an AD Jacobian?

If so, comparing maximum(abs.(jac_AD-jac_analytic)) for a random y_in gives me an error of about 1e-7. If I set f=1, this error vanishes, so it’s almost certainly coming from the evaluation of the max function (and either the smoothed or non-smoothed version give similar errors).

However, the MethodError from above persists whether I use max as-is or the smoothed version (both included in the above code). I also tried to smooth the max with log1p(exp(x/epsilon))*epsilon, but x/epsilon can be very large and positive, so this didn’t work as well.

Edit: I just tried some other random y_in vectors and the difference gets up to 1e-3 – still stemming from the max function