Sparse matrix error with forward diff

Hi all. I have the following MWE. I get a stackoverflow error when using a sparse matrix in a function that I am using ForwardDiff to find the gradient of. When using a dense matrix (i.e. change W1s to W1d in the anonymous function), there is no error.

It seems that using dual numbers in the sparse matrix is somehow causing an error. Does anyone have a suggestion on how to get the gradient of a function where I will need to use sparse arrays internally?

using LinearAlgebra, SparseArrays, ForwardDiff

W1s = sparse([1,2,3,5,5],[2,1,2,4,5],ones(Int,5))
W1d = Matrix(W1s)

ForwardDiff.gradient(x -> begin

    W1a = LinearAlgebra.I(5) - W1s*prod(x)
    ladetW1a = logabsdet(W1a)[1]

    return ladetW1a
end, [0.1,0.2])

I pasted the error below (on Julia 1.9.4).

ERROR: StackOverflowError:
Stacktrace:
  [1] SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}, Int64}(m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}})
    @ SparseArrays ~/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/share/julia/stdlib/v1.9/SparseArrays/src/sparsematrix.jl:26
  [2] SparseMatrixCSC(m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}})
    @ SparseArrays ~/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/share/julia/stdlib/v1.9/SparseArrays/src/sparsematrix.jl:44
  [3] float(S::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}, Int64})
    @ SparseArrays ~/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/share/julia/stdlib/v1.9/SparseArrays/src/sparsematrix.jl:935
  [4] lu(A::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}, Int64}; check::Bool) (repeats 21635 times)
    @ SparseArrays.UMFPACK ~/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/share/julia/stdlib/v1.9/SparseArrays/src/solvers/umfpack.jl:398
  [5] logabsdet(A::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}, Int64})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/generic.jl:1663
  [6] (::var"#7#8")(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}})
    @ Main ./REPL[4]:4
  [7] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/apiutils.jl:24 [inlined]
  [8] vector_mode_gradient(f::var"#7#8", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:89
  [9] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}}}, ::Val{true})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:19
 [10] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float64}, Float64, 2}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17
 [11] gradient(f::Function, x::Vector{Float64})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17
 [12] top-level scope
    @ REPL[4]:1

it seems that the problem is the use of the det function with the sparse matrix


julia> ForwardDiff.gradient(x -> begin
           W1a = x'*(LinearAlgebra.I(5) - W1s)*x
           return W1a
       end, [0.1,0.2,3,4,5])
5-element Vector{Float64}:
 -0.2
 -2.8
  5.8
  3.0
 -4.0

julia> ForwardDiff.gradient(x -> begin

           W1a = LinearAlgebra.I(5) - W1s*prod(x)
           ladetW1a = sum(W1a)
           return ladetW1a
       end, [0.1,0.2])
2-element Vector{Float64}:
 -1.0
 -0.5

Thanks for helping narrow the problem. Yes I now see that too:

ForwardDiff.gradient(x -> begin
    return logabsdet(W1s*prod(x))[1]    
end, [0.1,0.2])

Does anyone have any suggestions on workarounds?

The problem is that logabsdet calls lu factorization, but LU factorization for sparse matrices in Julia uses UMFPACK from SuiteSparse, which is an external library that only supports the standard hardware floating-point types. It won’t work for dual numbers, hence it won’t work with ForwardDiff.

(In particular, this line is calling float to convert the matrix to a floating-point type, but this hits a dispatch loop because the Dual numbers already are floating-point, and it overflows the stack. It really should be fixed to give a more comprehensible error message.)

In general, I find that automatic differentiation through sparse-matrix computations is almost always problematic, and one often has to write derivative rules out by hand. (See also this discussion.) Fortunately, it is quite straightforward to take the derivative of a log determinant, as explained in these notes and this lecture from our Matrix Calculus course at MIT.

3 Likes

Thanks for the detailed answer.

I tried using the fact that gradient(det(M(x))) = det(M(x)) * trace(inv(M(x)) * gradient(M(x))), but got the following warning

julia> gfx(M,x)=det(M(x))*trace(inv(M(x))*ForwardDiff.gradient(x->M(x),x))
gfx (generic function with 2 methods)

julia> W1s
5×5 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
 ⋅  1  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  1

julia> gfx(x->LinearAlgebra.I(5) - W1s*prod(x), [0.1,0.2]) 
ERROR: The inverse of a sparse matrix can often be dense and can cause the computer to run out of memory. If you are sure you have enough memory, please either convert your matrix to a dense matrix, e.g. by calling `Matrix` or if `A` can be factorized, use `\` on the dense identity matrix, e.g. `A \ Matrix{eltype(A)}(I, size(A)...)` restrictions of `\` on sparse lhs applies. Altenatively, `A\b` is generally preferable to `inv(A)*b`

So the problem seems to ultimately lie in calculating the inverse of the sparse matrix.

Just as a proof-of-concept, redefining a special det for sparse matrices using the Laplace Expansion (on the first column), gives something workable:

mydet(M) = size(M,1) > 1 ? sum((-1)^i*v*mydet(M[filter(!=(i),axes(M,1)),axes(M,2)[2:end]]) for (i,v) in zip(rowvals(M[:,1]),nonzeros(M[:,1])); init = 0) : M[1,1]
mylogabsdet(M) = log(abs(mydet(M)))

and now:

julia> ForwardDiff.gradient(x -> begin

           W1a = LinearAlgebra.I(5) - W1s*prod(x)
           ladetW1a = mylogabsdet(W1a)[1]

           return ladetW1a
       end, [0.1,0.2])
2-element Vector{Float64}:
 -0.21208483393357344
 -0.10604241696678672

The actual implementation of mydet needs to be optimized. But doing a column expansion on CSC sparse matrices can theoretically be made fast using clever ‘views’ of sparse matrices.

The “gradient of a matrix” is not a thing, as I understand the gradient; this is not how the chain rule works with matrices. (ForwardDiff agrees with me: it gives a DimensionMismatch: gradient(f, x) expects that f(x) is a real number. if you try to compute ForwardDiff.gradient(x -> M(x), x) … your code just didn’t get that far.)

A more explicit version that is correct is:

\frac{\partial \log \det M(x)}{\partial x_k} = \mathrm{trace} \left[ M^{-1} \frac{\partial M}{\partial x_k} \right]

Yes, this still involves M^{-1} applied to the derivative of M, which will in general be dense. But there are various things that you can do depending on the structure of M. (e.g. if \partial M / \partial x_k is sufficiently sparse, or perhaps low rank, you may only have to apply M \ ... to a few vectors. Or you can use iterative trace-estimation techniques. Or …) When matrices get big and sparse, you often need to think about the structure of your specific problem carefully (which is one of the difficulties with making AD fully automatic for large sparse-matrix calculations).

(I just ran into a 2019 preprint specifically on efficient computation of the gradient of the log–determinant for large, sparse matrices that may be worth looking over.)

But worse than exponential (factorial) complexity? If you are using sparse matrices, I’m assuming your matrices are huge. If you have small-ish matrices, you should use dense matrices anyway, and ChainRules.jl knows how to differentiate through log-determinants of dense matrices.

3 Likes

I had seen the formula for the simple variable x and tried to apply it to the “vector x”.

d(det(M(x)))/dx = det(M(x))*trace(inv(M(x)*d(M(x))/dx)

The idea was something like this

gradient(det(M(x)))=det(M(x))*trace(inv(M(x)*gradient.(M(x)))

, but it turned out (very) badly

You can’t (reliably) pattern-match calculus formulas like this when going to higher-dimensional objects. See my talk So You Think You Know How To Take Derivatives?.

3 Likes

Trying to apply the following formula to the case in question, calculating the partial derivatives of the function by hand because in this simple case [prod(x)] it is just a matter of putting 1 at the position of the differentiated variable. while at the same time following the warnings on calculating the inverse

invM=inv(M([0.1,0.2]))
# ERROR: The inverse of a sparse matrix can often be dense and can cause the computer to run out of memory. If you are sure you have enough memory, please either convert your matrix to a dense matrix, e.g. by calling `Matrix` or if `A` can be factorized, use `\` on the dense identity matrix, e.g. `A \ Matrix{eltype(A)}(I, size(A)...)` restrictions of `\` on sparse lhs applies. Altenatively, `A\b` is generally preferable to `inv(A)*b`

#d(det(M(x)))/dx = det(M(x))*trace(inv(M(x))*d(M(x))/dx)

i5=LinearAlgebra.I(5)
M(x)= i5- W1s*prod(x)
detm=det(M([0.1,0.2]))

inp(v)=map(j->ntuple(i->i!=j ? v[i] : 1, length(v)), eachindex(v))

dW1a(x) = - W1s*prod(x)

res=dW1a.(inp([0.1,0.2]))

m=M([0.1,0.2])
colr(i)=1+(i-1)*5:5+(i-1)*5
map(ir->sum(map(i->(m \ res[ir][colr(i)])[i], 1:5))*detm, eachindex(res))

we always end up in more or less the same problem

julia> map(ir->sum(map(i->(m \ res[ir][colr(i)])[i], 1:5))*detm, eachindex(res))
ERROR: MethodError: no method matching ldiv!(::SparseArrays.UMFPACK.UmfpackLU{Float64, Int64}, ::SparseVector{Float64, Int64})

I had found a LinearAlgebraX package that has a detx() function that I believe does something similar.

1 Like

See Sparse A\B

To do this efficiently you’ll may need to think carefully about your particular problem as I noted above.

julia> using LinearAlgebra, SparseArrays, ForwardDiff

julia> W1s = sparse([1,2,3,5,5],[2,1,2,4,5],ones(Int,5))
5×5 SparseMatrixCSC{Int64, Int64} with 5 stored entries:


julia> W1d = Matrix(W1s)
5×5 Matrix{Int64}:

julia> using Pkg; Pkg.add(url="https://github.com/sloisel/SparseSparse")
    Updating git-repo `https://github.com/sloisel/SparseSparse`
   Resolving package versions...

julia> using SparseSparse
julia> function grdet(W1s,x)
          i5=LinearAlgebra.I(5);
           M(v)= i5- W1s*prod(v)
           m=M(x);
           detm=det(m);
           #fare il gradiente(x->(const-w*prod(x))[x1,x2,...] equivale a calcolare la funzione x->(-w*prod(x)) in [[1,x2,x3,...], [x1,1,x3,...],...]
           # inp(v)=map(j->ntuple(i->i!=j ? v[i] : 1, length(v)), eachindex(v))
           dW1a(x) = map(i-> - W1s*prod(@views [x[1:i-1];x[i+1:end]]), eachindex(x))
           res = dW1a(x);
           colr(i) = 1+(i-1)*5:5+(i-1)*5
           map(ir->sum(map(i->(m \ res[ir][colr(i)])[i], 1:5))*detm, eachindex(res))

       end
grdet (generic function with 1 method)

julia> @time grdet(W1s,[0.1,0.2])
  0.078955 seconds (89.41 k allocations: 6.034 MiB, 99.24% compilation time)
2-element Vector{Float64}:
 -0.20776000000000006
 -0.10388000000000003

julia> @time grdet(W1s,[0.1,0.2])
  0.000221 seconds (2.35 k allocations: 289.094 KiB)
2-element Vector{Float64}:
 -0.20776000000000006
 -0.10388000000000003

julia> @time ForwardDiff.gradient(x -> begin

           W1a = LinearAlgebra.I(5) - W1d*prod(x)
           ladetW1a = det(W1a)

           return ladetW1a
       end, [0.1,0.2])
  0.561397 seconds (1.43 M allocations: 98.015 MiB, 5.04% gc time, 99.91% compilation time)
2-element Vector{Float64}:
 -0.20776000000000003
 -0.10388000000000001

but …

julia> using BenchmarkTools

julia> @btime grdet(W1s,[0.1,0.2])
  103.400 μs (2349 allocations: 289.09 KiB)
2-element Vector{Float64}:
 -0.20776000000000006
 -0.10388000000000003

julia> @btime ForwardDiff.gradient(x -> begin
           W1a = LinearAlgebra.I(5) - W1d*prod(x)
           ladetW1a = det(W1a)
           return ladetW1a
       end, [0.1,0.2])
  672.667 ns (10 allocations: 2.48 KiB)
2-element Vector{Float64}:
 -0.20776000000000003
 -0.10388000000000001

how are things?

Thanks for this explanation, I ran into the same stack overflow using ReverseDiff and could not figure out what was going on.

I’m still confused, though, because there are now rrules defined for determinants of sparse matrices. They use Takahashi’s sparse inverse subset algorithm to avoid inverting the whole sparse matrix, which is the trick used in the 2019 preprint you linked above. (There is not a similar frule yet, but it wouldn’t be hard to define since it can also use the sparse inverse subset trick.) These rrules work for Zygote, but not for ReverseDiff:

using SparseArrays, LinearAlgebra, Zygote, ReverseDiff

Zygote.gradient(x -> logdet(spdiagm(x)), [0.1,0.2]) # works
ReverseDiff.gradient(x -> logdet(spdiagm(x)), [0.1,0.2]) # StackOverflowError

Why isn’t ReverseDiff finding the correct rrule here?

ReverseDiff doesn’t use ChainRules without manual intervention.

1 Like

Thanks again, I had missed that part of the docs. This is maybe a different thread at this point, but could you provide an example of how to use that macro in this case? The following still produces the same StackOverflowError:

using LinearAlgebra, SparseArrays, ReverseDiff, ChainRules
ReverseDiff.@grad_from_chainrules LinearAlgebra.logdet(x::ReverseDiff.TrackedArray)
ReverseDiff.gradient(x -> logdet(spdiagm(x)), [0.1,0.2])