DifferentiationInterface+ForwardDiff unable to diff ldiv! (any other options to solve linear system w/o allocations and differentiate?)

r0uv3n · July 26, 2025, 1:14pm

I have a function where both a matrix g as well as a vector q depend on some vector input p. On the way I need to solve for g^{-1} q. I want to use DifferentiationInterface+AutoForwardDiff to take derivatives of this function and also avoid any allocations on the way. The dimension of p and q won’t be ever be bigger than 10 or so. Consider e.g. the following MWE
(ignore the specific computations, just wanted to have a nontrivial example)

using LinearAlgebra, DifferentiationInterface, StaticArrays
import ForwardDiff
function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] = p[i]
        g_buffer[1, i] = p[i]
    end
    ldiv!(q, g_buffer, p)
    return nothing
end

g_buffer = zero(MMatrix{3,3,Float32})
p = @MVector [1f0, 2f0, 3f0]
q = MVector{3, Float32}(undef)

jac = MMatrix{3,3,Float32}(undef)
jacobian!(want_to_diff!, q, jac, AutoForwardDiff(), p, Cache(g_buffer))

This raises no method matching ldiv!(::MMatrix{3, 3, ForwardDiff.Dual{…}, 9}, ::MVector{3, ForwardDiff.Dual{…}}).

I attempted manually specifying the factorization object by using FastLapackInterface’s BunchKaufmanWs, but even when manuallly trying to define DifferentiationInterface’s recursive_similar for these workspaces I got similar errors. If I understand Solve linear system repeatedly without allocation - #4 by PeterSimon correctly, then this is expected since FastLapackInterface does not apparently work with FastForward.

If a non-allocating inv! function existed, I would gladly use it, even though factorization would of course be preferable. But since I deal with low-dimensional vectors and matrices, I assume this wouldn’t be much of a problem.

Do I need to use PreallocationTools or something like that? Or is there a better way to accomplish these kind of calculations without allocations and in DifferentiationInterface-friendly manner? Am I just missing something small to get ldiv! to works here?

Edit: This does not seem to be specific to ForwardDiff. I get the same error using AutoEnzyme(; mode=Enzyme.Forward).

yolhan_mannes · July 26, 2025, 6:40pm

Don’t have my pc right now but if it is a method definition problem you can try LinearSolve.jl which may implement derivatives in some cases or at least have generic methods in case method isn’t implemented. Also fully inplace case should be implemented too

r0uv3n · July 26, 2025, 7:49pm

Yeah I can see fully inplace is doable with linsolve = LinearSolve.init(...). But I get errors when trying to form Cache(linsolve). Do I just need to define this manually for LinearSolve.LinearCache? Or can I somehow use PreallocationTools? Manually just defining recursive_similar for the LinearCache struct (probably by just calling it on every field and reconstructing) seems inelegant / needlessly complicated, is there a nicer way of dealing with this?

Edit: What I tried was

using LinearAlgebra, DifferentiationInterface, StaticArrays, LinearSolve
using RecursiveFactorization
import ForwardDiff
function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}, linsolve::LinearSolve.LinearCache) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] = p[i]
        g_buffer[1, i] = p[i]
    end
    linsolve.A = Symmetric(g_buffer)
    linsolve.b = p
    solve!(linsolve)
    copy!(q, linsolve.u)
    return nothing
end

function DifferentiationInterface.recursive_similar(
    linsolve::LinearSolve.LinearCache, t
) 
    return LinearSolve.LinearCache
end

g_buffer = zero(MMatrix{3,3,Float32})
p = @MVector [1f0, 2f0, 3f0]
q = MVector{3, Float32}(undef)

prob = LinearProblem(Symmetric(g_buffer), p)
linsolve = LinearSolve.init(prob, LinearSolve.BunchKaufmanFactorization())

jac = MMatrix{3,3,Float32}(undef)
jacobian!(want_to_diff!, q, jac, AutoForwardDiff(), p, Cache(g_buffer), Cache(linsolve))

Edit2: Ah wait, I must have made a mistake somewhere, here I already get an error for
want_to_solve!(q, p, g_buffer, linsolve), namely TypeError: in setfield!, expected Nothing, got a value of type BunchKaufman{Float32, MMatrix{3, 3, Float32, 9}, Vector{Int64}}. Will have to figure this out first probably.

r0uv3n · July 26, 2025, 8:22pm

Simplifying somewhat, I have

using LinearAlgebra, DifferentiationInterface, StaticArrays, LinearSolve
using RecursiveFactorization
import ForwardDiff
function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}, linsolve) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] = p[i]
        g_buffer[1, i] = p[i]
    end
    linsolve.A = g_buffer
    linsolve.b = p
    solve!(linsolve)
    copy!(q, linsolve.u)
    return nothing
end

g_buffer = zero(MMatrix{3,3,Float32})
p = @MVector [1f0, 2f0, 3f0]
q = MVector{3, Float32}(undef)

prob = LinearProblem(g_buffer, p)
linsolve = LinearSolve.init(prob)

jac = MMatrix{3,3,Float32}(undef)
jacobian!(want_to_diff!, q, jac, AutoForwardDiff(), p, Cache(g_buffer), Cache(linsolve))

With this setup, want_to_diff!(q, p, g_buffer, linsolve) works just fine, but jacobian!(want_to_diff!, q, jac, AutoForwardDiff(), p, Cache(g_buffer), Cache(linsolve)) raises

ERROR: setfield! fields of Types should not be changed

yolhan_mannes · July 26, 2025, 8:24pm

Yeah it really doesn’t want you to do it, I tried 100 things too, also funy thing this is false :

function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] = p[i]
        g_buffer[1, i] = p[i]
    end
    q .= (g_buffer \ p)
    return q
end

g_buffer = zero(MMatrix{3,3,Float32})
p = @MVector [1f0, 2f0, 3f0]
q = MVector{3, Float32}(undef)
want_to_diff!(q, p, g_buffer)
grad = MMatrix{3,3,Float32}(undef)
jacobian!(want_to_diff!, q, grad, AutoForwardDiff(), p, Cache(g_buffer))

gives

3×3 MMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
  0.0  0.0  0.0
 -0.0  0.0  0.0
 -0.0  0.0  0.0

btw your “grad” should be a 3x3 matrix because you want a jacobian

r0uv3n · July 26, 2025, 8:28pm

btw your “grad” should be a 3x3 matrix because you want a jacobian

Ah, of course, changed the setup in the process of writing the original post, I’ll fix it in the above code snippets

yolhan_mannes · July 26, 2025, 8:37pm

found the original issue this is how we use ldiv :

using LinearAlgebra, DifferentiationInterface, StaticArrays
import ForwardDiff
function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] = p[i]
        g_buffer[1, i] = p[i]
    end
    ldiv!(q,lu!(g_buffer) ,p)
    return q
end

g_buffer = zero(MMatrix{3,3,Float32})
p = @MVector [1f0, 2f0, 3f0]
q = MVector{3, Float32}(undef)


want_to_diff!(q, p, g_buffer)
grad = MMatrix{3,3,Float32}(undef)
jacobian!(want_to_diff!, q, grad, AutoForwardDiff(), p, Cache(g_buffer))

still weird about LinearSolve.jl though but that should be another thread maybe

r0uv3n · July 26, 2025, 8:38pm

Oh, I just noticed, that might be because I accidentally made a function that always evaluates to [1,0,0].

With a fixed

function want_to_diff!(q::MVector{n, T}, p::StaticVector{n,T}, g_buffer::MMatrix{n,n,T}) where {n<:Any, T<:Real}
    fill!(g_buffer, zero(T))
    for i in 1:n
        g_buffer[i, i] = p[i]
        g_buffer[i, 1] += sin(p[i])
        g_buffer[1, i] += sin(p[i])
    end
    q .= (g_buffer \ p)
    return nothing
end

the jacobian in your example returns something nonzero.

yolhan_mannes · July 26, 2025, 8:39pm

lol true, yeah f32 precision around 1e-8

julia> eps(Float32)
1.1920929f-7

Note that if the matrix what a SMatrix your life would be so much easier :') Maybe you can rethink your code a little to make that possible ?

r0uv3n · July 26, 2025, 8:45pm

Sadly, this allocates, but plain q .= (g_buffer \ p) seems to work fine

r0uv3n · July 26, 2025, 8:48pm

Wait, why would SMatrix over MMatrix for g make my life easier here?

r0uv3n · July 26, 2025, 9:00pm

Ah damn, that’s only true up to dimension 3. Getting allocations starting with n=4. Is there an alternative factorization instead of lu! that can work without allocations (and still works with DifferentiationInterface etc.)?

yolhan_mannes · July 26, 2025, 9:02pm

like this maybe I may have done something bad

using LinearAlgebra, DifferentiationInterface, StaticArrays, LinearSolve
import ForwardDiff

function want_to_diff( p::SVector{n,T}) where {n<:Any, T<:Real}
    g_buffer = SMatrix{n,n,T}(
        ntuple( n*n ) do I
            j,i = divrem(I-1, n) .+ 1
            (i==j || j==1 || i==1) ?  p[i] : zero(T)
        end
        )
    q = g_buffer \ p
    return q
end
p = SA[1f0, 2f0, 3f0]
want_to_diff(p)
using BenchmarkTools
prep = prepare_jacobian(want_to_diff, AutoForwardDiff(), p)
@btime jacobian(want_to_diff,$prep, AutoForwardDiff(), $p)

leads to 0 alloc and 22ns only I agree its quite hard to read but it works, you could make an helper function for this crazyness btw

SMatrix{n,n,T}(
        ntuple( n*n ) do I
            j,i = divrem(I-1, n) .+ 1
            (i==j || j==1 || i==1) ?  p[i] : zero(T)
        end
        )

Static Vectors ect leads to allocs after around 16 elements which explains it

r0uv3n · July 26, 2025, 9:13pm

Ah alright then, I think I originally thought I had to stop using SMatrices and SVectors because Enzyme kept complaining whenever I tried to use it, but that may have been due to stuff like type instability and allocations etc., maybe I’ll try to switch back to SMatrices.

Also I thought I couldn’t use prepare_jacobian because it wouldn’t be thread-safe, but I only just now realized I could just allocate one for each thread and be fine.

Ah so does that mean such a static approach would not work if I want to go up to n=9 or something? I won’t go much higher than that, but that’s presumably out of this allocation free zone for static arrays? From some quick tests it seems that already for n=4 you’re building of g_buffer is allocating.

yolhan_mannes · July 26, 2025, 9:14pm

yeah remember to use the new OncePerThread and not theadid

yolhan_mannes · July 26, 2025, 9:17pm

no you would get a 9*9=81 element SMatrix thats far too big for the compiler to handle

r0uv3n · July 26, 2025, 9:20pm

So then staying with MMatrix is best for that use case? Also do you have any recommendation for how to avoid the allocations that creep in at higher dimensions for lu! or g_buffer \ p? That’s why I initially went to FastLapackInterface, but that seemed to cause problems with DifferentiationInterface

yolhan_mannes · July 26, 2025, 9:32pm

not really there is probably no way actually beside iterative solvers maybe ? maybe you can make your own alg the textbook one is really great for so small matricies something like mylu!(c,A,b,cache) also there may be blas function allowing to give buffer but not sure at all, this is all because you want to update the matrix if you only wanted to change the rhs it would be ok but here its quite hard

r0uv3n · July 26, 2025, 11:55pm

You mean because g depends on p and I want to assign it in a non-allocating manner? The background for me is that p is a vector on a somewhat low-dimensional manifold M, g will be a more or less user defined (but still autodifferentiable) metric and then I’ll need to act with g^{-1} on a gradient (on a relatively arbitrary, but autodiffable function u on M) at p. I have a CPU-only ModelingToolkit+NeuralPDE (or rather my own updated version of the Sophon fork of NeuralPDE) solution using Manifolds.jl for the metric stuff, but found the performance not sufficient and after fighting for a few weeks did not manage to get the entire thing running on a GPU. So now I’m trying to manually roll most of the solution stack, and part of that is for example something so simple as finding an efficient way to compute the Christoffel symbols for a metric in a nonallocating manner and still being able to autodiff them (e.g. for curvature tensors).

I only recount all this extra info because I feel that my problem setup pretty much leaves me with only one path forward, and I am unsure whether I made a basic error somewhere on the way by not considering a more out-of-the-box solution.

I’ll think before trying to handroll mylu! or anything I’ll try to continue looking for wayt to get LinearSolver.jl and DifferentiationInterface.jl to interop nicely.

jClugstor · July 27, 2025, 1:37am

~~What version of LinearSolve? Recent versions of LinearSolve actually use a DualLinearCache if A or b contain Dual numbers.~~ Actually that’s probably not relevant, since you’re passing the linsolve directly.

Topic		Replies	Views
Solving linear system without allocating Numerics	19	494	August 9, 2025
Solve linear system repeatedly without allocation General Usage question , linearalgebra , memory-allocation	11	1213	July 9, 2022
My function works on it's own but not in NonLinearSolve General Usage forwarddiff , nonlinearsolve	10	491	April 22, 2024
ForwardDiff with Matrix of Vector as input argument Specific Domains question	37	230	October 17, 2024
Using PreallocationTools with LinearSolve Modelling & Simulations	3	291	August 12, 2024

DifferentiationInterface+ForwardDiff unable to diff ldiv! (any other options to solve linear system w/o allocations and differentiate?)

Related topics