DifferentiationInterface+ForwardDiff unable to diff ldiv! (any other options to solve linear system w/o allocations and differentiate?)

The GenericLUFactorization is the Base.LinearAlgebra.generic_lu! which does not have pivoting. It’s only for small cases. I’d really recommend just using RecursiveFactorization.jl (i.e. RFLUFactorization.jl) if you need a generic algorithm.

Though as Jadon mentioned, the latest versions of LinearSolve has overloads for ForwardDiff, Zygote, and Enzyme Reverse so they won’t differentiate the solver. So if it’s dual numbers, you can even use MKL/CUDA just fine.

But static arrays hit a specific static arrays only dispatch in LinearSolve (LinearSolve.jl/src/common.jl at main · SciML/LinearSolve.jl · GitHub), because static arrays has special overloads that are non-allocating and don’t allow mutation. LinearSolve.jl/ext/LinearSolveForwardDiffExt.jl at main · SciML/LinearSolve.jl · GitHub I don’t think we tested whether those will be hit in the context of static arrays, and my guess is that the issue is that they might not. We should make sure to test and cover that case.

1 Like

If they’re passing a LinearCache to solve!, even if A or b are Dual, it won’t go through the overloads. The overloads are only hit if a DualLinearCache is passed to solve!.

StaticArrays should work, the primal problem that actually goes through the solver will be a StaticLinearProblem. But yes, it’s untested so I’ll add some tests.

I’m not an expert on the Caching API for DifferentiationInterface, but I think Cache(::LinearCache) might not be valid. See the warning on the docs here: API · DifferentiationInterface.jl.

" Some backends require any Cache context to be an AbstractArray , others accept nested (named) tuples of AbstractArray s."

1 Like

Note that I am not currently using StaticArrays (but instead MVector and co), so from what I can see I should not hit those methods.

I have tried RFLUFactorization, but like other methods using LU I have not found a way to pass the LinearSolve.init cache to DiffererentiationInterface.jacobian!. In particular even if I manually (though in the most naive way) define recursive_similar via type piracy for the involved structs (very much going against the warning from the docs that JClugstor mentioned), I still get warnings of the form

MethodError: no method matching lu!(::MMatrix{3, 3, ForwardDiff.Dual{…}, 9}, ::Vector{ForwardDiff.Dual{…}}, ::Val{true}, ::Val{true}; check::Bool)

The DualLinearCache (and in general the machinery defined in LinearSolveForwardDiffExt.jl ) you both mentioned seems interesting. Is this something I might want to wrangle manually (using something like Base.get_extension(LinearSolve, :LinearSolveForwardDiffExt).DualLinearCache)? I assume all of this would require me to abandon using DifferentiationInterface anyway.

maybe GitHub - JuliaDecisionFocusedLearning/ImplicitDifferentiation.jl: Automatic differentiation of implicit functions ?

The use of DI.Cache is semantically incorrect here because SciML doesn’t have the same concept of cache. For DI, a cache is a purely storage-oriented space, where the initial values inside don’t matter. If your SciML or factorization cache is not just an erasable scratch space, at the moment my suggestion would be to use DI.ConstantOrCache as a context wrapper and preallocate the content with PreallocationTools.jl.

It is that though, so what part of the interface is violated? Is the interface documented?

The interface is documented here in the docstring API · DifferentiationInterface.jl and explained again in the tutorial Advanced features · DifferentiationInterface.jl.

I don’t know what’s inside the SciML caches but I assumed some aspects cannot be safely overwritten with arbitrary values. For a simpler case I took a Quick Look ™ at FastLapackInterface.jl’s Bunch-Kaufman workspace for instance, and it seems to contain a mixture of scratch space (which can safely be erased) and pivot indices (which should definitely not be erased): FastLapackInterface.jl/src/bunch_kaufman.jl at 606c30813ce70d50321763217834214757813ffe · DynareJulia/FastLapackInterface.jl · GitHub. That’s when DI.ConstantOrCache with PreallocationTools.jl becomes necessary.

1 Like

Just to clarify I think an mwe of what we tried is

using DifferentiationInterface,LinearSolve
import ForwardDiff
function f!(c,b,A,linsolve)
    A[1,1] = b[1]
    linsolve.A = A
    linsolve.b = b
    c .= solve!(linsolve).u
    return c
end
A = rand(3,3)
b = rand(3)
c = rand(3)
prob = LinearProblem(A, b)
linsolve = init(prob)
jacobian(f!,c,AutoForwardDiff(),b,Cache(A),ConstantOrCache(linsolve))

neither Constant nor Cache nor ConstantOrCache works here, and then yes we tried with MMatrix but thats something else I think, maybe adding an option to init to allow Dual ? may be type unstble though

1 Like

I see, yes most SciML caches should be a DI.ConstantOrCache as there’s certain algorithm specific constants that shouldn’t be zeroed (I presume this has to do with make_zero! behavior).

Open an issue on LinearSolve and I can dig into it (though it’s the post-JuliaCon rush so it might be a bit delayed). But I think it’s at least pretty clear there that you’re building an init with a non-dual value and so there’s no way you’re going to forwarddiff that because linsolve.A and linsolve.b aren’t dual carriers. The easiest solution is likely to just use a PreallocationTools.LazyBufferCache, and hopefully that’s a good enough hint to get it done though if not if you open an issue I should answer by the end of the week with something more detailed.

2 Likes

Quick question on the side, may not be related: When trying relatively naive code like

using LinearAlgebra, DifferentiationInterface, StaticArrays, LinearSolve
using RecursiveFactorization
import ForwardDiff
function want_to_diff!(c::MVector{n, T}, b::StaticVector{n,T}, A::MMatrix{n,n,T}, linsolve) where {n<:Any, T<:Real}
    fill!(A, zero(T))
    for i in 1:n
        A[i, i] = sin(b[i])
    end
    linsolve.A = SMatrix(A)
    linsolve.b = SVector(b)
    c .= solve!(linsolve).u
    return c
end

n = 4
A = one(MMatrix{n,n,Float32})
b = rand(MVector{n, Float32})
c = MVector{n, Float32}(undef)

prob = LinearProblem(SMatrix(A), SVector(c))
linsolve = LinearSolve.init(prob, RFLUFactorization())
want_to_diff!(c, b, A, linsolve)

(I have switched to using static arrays for LinearProblem as I otherwise got errors of the form
TypeError: in setfield!, expected Tuple{StaticArrays.LU{LowerTriangular{Float32, SMatrix{4, 4, Float32, 16}}, UpperTriangular{Float32, SMatrix{4, 4, Float32, 16}}, SVector{4, Int64}}, Vector{Int64}}, got a value of type Tuple{LU{Float32, MMatrix{4, 4, Float32, 16}, Vector{Int64}}, Vector{Int64}})
Running the above still yields

ERROR: TypeError: in setfield!, expected Tuple{StaticArrays.LU{LowerTriangular{Float32, SMatrix{4, 4, Float32, 16}}, UpperTriangular{Float32, SMatrix{4, 4, Float32, 16}}, SVector{4, Int64}}, Vector{Int64}}, got a value of type Tuple{LU{Float32, SMatrix{4, 4, Float32, 16}, Vector{Int64}}, Vector{Int64}}

Am I using this interface wrong? Or is this something I should raise an Issue about over on GitHub - SciML/LinearSolve.jl: LinearSolve.jl: High-Performance Unified Interface for Linear Solvers in Julia. Easily switch between factorization and Krylov methods, add preconditioners, and all in one interface.?

Trying this with the reduced MWE by yolhan_mannes I have

using DifferentiationInterface, LinearSolve, PreallocationTools, LinearAlgebra
import ForwardDiff

function f!(c,b,linsolve_cache::GeneralLazyBufferCache)
    linsolve = linsolve_cache[b]
    linsolve.A = LinearAlgebra.diagm(sin.(b))
    linsolve.b = b
    c .= solve!(linsolve).u
    return c
end

n = 5
b = rand(n)
c = rand(n)

function get_cache(p)
    A = diagm(p)
    prob = LinearProblem(A, p)
    return LinearSolve.init(prob)
end

jacobian(f!,c,AutoForwardDiff(),b,ConstantOrCache(GeneralLazyBufferCache(get_cache)))

gives an error

ERROR: MethodError: no method matching has_sys(::Nothing)
The function `has_sys` exists, but no method is defined for this combination of argument types.

I think the code in LinearSolveForwardDiffExt.jl assumes that a SymbolicLinearInterface is set? It seems just doing a bit of type piracy and setting SciMLBase.has_sys(::Nothing) = false fixes the issue, but am I breaking any assumptions by doing this or is this just a small oversight?

One further question still: The allocating version now works quite well, but trying to use StaticArrays like this

using LinearAlgebra, DifferentiationInterface, StaticArrays, LinearSolve, PreallocationTools, RecursiveFactorization
import ForwardDiff
function f!(c::StaticVector{n, T}, b::StaticVector{n,T}, linsolve_cache::GeneralLazyBufferCache) where {n<:Any, T<:Real}
    linsolve = linsolve_cache[b]
    linsolve.A = diagm(sin.(b))
    linsolve.b = b
    c .= solve!(linsolve).u

    return nothing
end


function get_cache(b::StaticVector{n,T}) where {n<:Any, T<:Real}
    prob = LinearProblem(one(MMatrix{n,n,T}), b)
    return LinearSolve.init(prob)
end

n = 4
b = rand(MVector{n, Float32})
c = MVector{n, Float32}(undef)

linsolve_cache = GeneralLazyBufferCache(get_cache)

jac = MMatrix{n,n,Float32}(undef)

SciMLBase.has_sys(::Nothing) = false

jacobian!(f!, c, jac, AutoForwardDiff(), b, ConstantOrCache(linsolve_cache))

results in

ERROR: TypeError: in setfield!, expected MVector{4, Float32}, got a value of type SizedVector{4, Float32, Vector{Float32}}
Stacktrace:
  [1] setproperty!(cache::LinearSolve.LinearCache{…}, name::Symbol, x::SizedVector{…})
    @ LinearSolve ~/.julia/packages/LinearSolve/wu92d/src/common.jl:101
  [2] #linearsolve_forwarddiff_solve#1
    @ ~/.julia/packages/LinearSolve/wu92d/ext/LinearSolveForwardDiffExt.jl:68 [inlined]
...

Is this some incompatibility between LinearSolveForwardDiffExt and StaticArrays?

Hmm, I don’t seem to get that error when I run it

jacobian(f!,c,AutoForwardDiff(),b,ConstantOrCache(GeneralLazyBufferCache(get_cache)))

5×5 Matrix{Float64}:
 0.121715  0.0        0.0        0.0        0.0
 0.0       0.0720122  0.0        0.0        0.0
 0.0       0.0        0.0431511  0.0        0.0
 0.0       0.0        0.0        0.0658451  0.0
 0.0       0.0        0.0        0.0        0.31498
(LinearSolveDiffIssue) pkg> st
Status `~/Documents/Work/dev/SensitivityIssues/LinearSolveDiffIssue/Project.toml`
  [a0c0ee7d] DifferentiationInterface v0.7.3
  [f6369f11] ForwardDiff v1.0.1
  [7ed4a6bd] LinearSolve v3.23.0
  [f2c3362d] RecursiveFactorization v0.2.23
  [90137ffa] StaticArrays v1.9.14

Could be, I’ll look in to it.

1 Like

Ah, right, I was on DifferentiationInterface v0.6.54 for some reason, in a new temporary environment that works quite well.

I still have sadly not found a version so far that is entirely non-allocating in dimensions bigger than 3. Even something very simple like

using LinearSolve, BenchmarkTools, StaticArrays, LinearAlgebra
n = 4
A = rand(SMatrix{n,n,Float32})
b = rand(SVector{n,Float32})
linsolve = LinearSolve.init(LinearProblem(A,b))
function f!(linsolve::LinearSolve.LinearCache)
   linsolve.A = rand(SMatrix{n,n,Float32,n*n})
   linsolve.b = rand(SVector{n,Float32})
   solve!(linsolve)
   return linsolve.u
end
@btime f!($linsolve)

still shows allocations.

You may want to pass n as a type parameter instead of a global variable

1 Like

Sorry, I thought I had already tried that, but yeah, that works. In my original code I definitely do have n type-annotated, so I’ll try to produce new MWE.

Edit: Ah alright, the trick was to fully specify the type of linsolve upon getting it from the GeneralLazyBufferCache, I think? I assume without that some type instability causes the allocations? Is this just due to me using GeneralLazyBufferCache instead of LazyBufferCache? I thought this would fit better since LazyBufferCache seemed to be very much intended for Arrays and the docs specifically used GeneralLazyBufferCache for structs, but is there a better way to avoid this type instability using LazyBufferCache?

I thought I could simplify the above by using function barriers, but with something simple such as



DualLinearCache = Base.get_extension(LinearSolve,:LinearSolveForwardDiffExt).DualLinearCache

function f!(c::MVector{n,T}, b::StaticVector{n,T}, linsolve::Union{DualLinearCache,LinearSolve.LinearCache} ) where {n<:Any, T<:Real}
    linsolve.A = diagm(sin.(b))
    linsolve.b = b
    c .= solve!(linsolve).u

    return c
end


function f!(c::MVector{n,S}, b::StaticVector{n,S}, linsolve_cache::GeneralLazyBufferCache) where {n<:Any, S<:Real}
    linsolve  = linsolve_cache[b]
    return f!(c, b, linsolve)
end

I still get a single allocation when calling f! with a linsolve_cache. That same allocation disappears when I use linsolve=linsolve_cache[b] manually and then call f! with linsolve. Manually writing two different f! for S<:ForwardDiff.Dual and S some float while fully specifying the entire type for linsolve (for example like

...
linsolve::Base.get_extension(LinearSolve,:LinearSolveForwardDiffExt).DualLinearCache{MatrixMutableVectorLinearCache{n,T,nn}, DataType, SMatrix{n, n, ForwardDiff.Partials{n, T}, 16}, MVector{n, ForwardDiff.Partials{n, T}}, MVector{n, ForwardDiff.Partials{n, T}}, SMatrix{n, n, S, 16}, MVector{n, S}, Vector{S}} = linsolve_cache[b]
...

for the dual case) eliminates the allocations, but that seems as mentioned somewhat inelegant.

Also, while trying to use StaticArrays everywhere right now I noticed that prepare_jacobian (and probably jacobian itself as well when called without preparation) have xdual (i.e. prep.config.duals[2]) as an MVector{n, ForwardDiff.Dual} even when the input for f! was specified as an SVector. In particular f! get’s called with an MVector and I think (?) this causes some type instability and ultimately allocations (even for a fully non-allocating f!, jacobian! still remains allocating, though this maybe this could also be caused by the SizedVector vs StaticVector thing in LinearSolveForwardDiffExt? See quote below for context

I am currently working around that by doing some type piracy on Base.setproperty!(cache::LinearSolve.LinearCache, name::Symbol, x::SizedVector{n,T}) and manually converting to the right StaticVector in there, but this also seems very hacky.