Sparse jacobians of matrix models

Hi all,

I would like to use sparsity to speed up an optimization problem.

I have a NxM matrix of data Z and a model m(A) = A - k*sum(A, dims=2)*sum(A,dims=1) where A is a matrix of the same dimensions of Z, sum(A, dims=2) an Nx1 vector and sum(A,dims=1) an 1xM vector so that k*sum(A, dims=2)sum(A,dims=1) is also an NxM matrix.

I want to compute the jacobian J of the model m(A) with respect to A (I’m thinking automatic differentiation). The jacobian J will be a (NxM) x (NxM) sparse metric.

I will need then to solve a linear system Jv=w, where both v and w are (NxM)x1 vectors.
(Basically for a Gauss-Newton update of A starting from some initial condition for A, until the model m(A) matches the data Z. There are some positivity constraints on A to make the problem interesting, but that’s not relevant at this stage.)

I would like to use the fact that J is sparse to speed up my computations. What’s the best approach in Julia, can you give me some pointers and maybe provide some basic examples?

1 Like

Hi!

First of all, in this simple case you may not need to use autodiff at all, since the Jacobian looks quite easy to compute. You can even express it as a (functional) linear operator instead of a matrix, and then use packages such as IterativeSolvers.jl, LinearSolve.jl, Krylov.jl or KrylovKit.jl to solve your linear system.

I don’t think Zygote and the like natively output sparse Jacobians, but I just discovered there are tools out there to help you do just that: maybe SparseDiffTools.jl or SparsityDetection.jl can be useful?

2 Likes

@Vaibhavdixit02 just added a bunch of sparsity support to Optimization.jl. @Vaibhavdixit02 , do you think you can get a tutorial up to demonstrate it for the OP here? That’s probably a good starting point, and there might be some back and forth required from that to get the solution, but at least the tools exist but need docs right now.

2 Likes

Thank you. I’ll get back to you soon hopefully. I was aware of SparseDiffTools.jl and quickly played with it. I believe the automatic differentiation example with forwarddiff_color_jacobian was failing in my case (might be a trivial issue, not sure). I definitely want to use automatic differentiation, since the goal is to have a tool for quick model exploration and characterization that can be more complex than the example I showed. In general I prefer to have full control of the optimization algorithm, but I’m happy to try Optimization.jl if there are some examples (is there a link?).

What do you mean by “failing in my case”? MWE?

If A and m(A) are sparse, then each of them is represented internally by a (dense) vector of non-zero values, and a simple data structure describing the index corresponding to each non-zero value (the sparsity structure) (cf findnz).

For a given A , compute m(A) - you now have the sparsity structures of A and m(A).

Create a function which, given a vector of values of length nnz(A), builds A, computes m(A) and extract its vector of non-zero values.

This function transforms a dense vector into a dense vector. Using automatic differentiation, compute its Jacobian, a matrix of size (nnz(A),nnz(m(A))) (or was it transposed…).

Using the sparsity structures of A and m(A) you can compute the sparsity structure of your 4th order Jacobian array.

I think…?

Hi @ChrisRackauckas , I apologize for the delay.

I have a small Pluto notebook with a MWE, where the forwarddiff_color_jacobian!(autoJac, s, d, colorvec = colors) will generate a BoundsError.

I can’t seem to be able to attach it here in any form (I can’t attach the html, I can’t add it as preformatted text as it’s too long, more than 32000 lines due to the included package dependency tree). Any suggestions on how to share the notebook for people to look at it?




I guess the only relevant piece of code is:

	const N1 = 8
	const N2 = 6
	scall = 0
	function s(m,a)
		global scall += 1
		for i = 1:size(a)[1]
			for j = 1:size(a)[2]
				m[i,j] = a[i,j] - 0.1*sum(a[i,:])*sum(a[:,j])
			end
		end
	end
	function reset_scall()
		global scall = 0
    end

and the question basically boils down to creating a working example for forwarddiff_color_jacobian! along the lines of the example in GitHub - JuliaDiff/SparseDiffTools.jl: Fast jacobian computation through sparsity exploitation and matrix coloring for a 2D matrix input.

Just vec the input?

If I vec(d) the input then the definition of the function s doesn’t make sense and you get another error.

I guess one could manually vectorize and reshape the input of the function s inside the function definition? But the point was to avoid it and keep working with a matrix.

Also that didn’t work:

going from

function s(m,a)

to

function s(m,b)
a = reshape(b,N1,N2)

and passing a vectorized input still fails.

Can you try to build an example?

Here it is, you should be able to open it as Pluto notebook or just as a Julia script:

### A Pluto.jl notebook ###
# v0.19.9

using Markdown
using InteractiveUtils

# ╔═╡ 690f6163-7a99-4b3e-bb70-47775b14dfb4
using SparsityDetection, SparseArrays

# ╔═╡ e8b4b6c5-1871-4675-b66a-97008f1b23c8
using SparseDiffTools

# ╔═╡ 55afd5f5-504c-4574-bd03-58fa9a5e09cc
using Plots

# ╔═╡ bfb90258-de91-4955-adcf-7c93f79fdefe
using FiniteDiff

# ╔═╡ 526d04e1-e3d0-4b06-b3b2-d55e2fc73bec
using IterativeSolvers

# ╔═╡ 5feaede9-0bd3-4e66-9063-ce828eaf6f7e
begin
	const N1 = 8
	const N2 = 6
end

# ╔═╡ 59a70fac-01e1-429f-81bf-06ab74b1c3a7
begin
	scall = 0
	function s(m,b)
		a = reshape(b,N1,N2)
		global scall += 1
		for i = 1:size(a)[1]
			for j = 1:size(a)[2]
				m[i,j] = a[i,j] - 0.1*sum(a[i,:])*sum(a[:,j])
			end
		end
	end
	function reset_scall()
		global scall = 0
    end
end

# ╔═╡ 38b541da-e906-4af4-8736-42e3c65e79cf
d = rand(N1,N2)

# ╔═╡ cd74031b-ebd1-4e18-89d4-e8f68f101a89
k = reshape(d,N1,N2)

# ╔═╡ 4b4f9102-ea56-4c92-9655-34771e32a591
o = similar(d);

# ╔═╡ 8494a376-222f-4ca7-abe4-233ec037b3ba
sparsity_pattern = jacobian_sparsity(s,o,d)

# ╔═╡ e47c9391-0e27-4c54-8c3a-2d58f2935759
jac = Float64.(sparse(sparsity_pattern))

# ╔═╡ 0234b949-3520-4853-9018-5f3b96c803fd
colors = matrix_colors(jac)

# ╔═╡ c5aa54b7-e9bd-45c6-9673-44e9bb215f9a
begin
	reset_scall()
	FiniteDiff.finite_difference_jacobian!(jac, s, d, colorvec=colors)
	@show(scall)
end

# ╔═╡ e4116e71-d638-424a-a319-0b8cd084bade
jac\vec(d)

# ╔═╡ ab015e88-631d-4cfe-bf09-400bedb25f20
j = vec(jac)

# ╔═╡ d4ab28fb-9124-420d-8f99-3b099c73f6ed
scatter(j)

# ╔═╡ d80b1f58-8dd0-4a87-a6a2-3038e4f3da2a
heatmap(Matrix(jac))

# ╔═╡ 867a6b3a-9a6d-4c94-94d9-5a63a642b028
autoJac = zeros(N1*N2,N1*N2)

# ╔═╡ f20c670a-9f86-4def-97e1-79e8a2f63bf6
vec(d)

# ╔═╡ 3be2e06f-e005-487d-921e-d2d56702599a
dv = deepcopy(vec(d))

# ╔═╡ 6741b47d-4513-4dcb-aec8-186c03c24ccc
forwarddiff_color_jacobian!(autoJac, s, dv, colorvec = colors)

# ╔═╡ 00000000-0000-0000-0000-000000000001
PLUTO_PROJECT_TOML_CONTENTS = """
[deps]
FiniteDiff = "6a86dc24-6348-571c-b903-95158fe2bd41"
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
SparseDiffTools = "47a9eef4-7e08-11e9-0b38-333d64bd3804"
SparsityDetection = "684fba80-ace3-11e9-3d08-3bc7ed6f96df"

[compat]
FiniteDiff = "~2.12.1"
IterativeSolvers = "~0.9.2"
Plots = "~1.29.1"
SparseDiffTools = "~1.23.0"
SparsityDetection = "~0.3.4"
"""

# ╔═╡ Cell order:
# ╠═690f6163-7a99-4b3e-bb70-47775b14dfb4
# ╠═e8b4b6c5-1871-4675-b66a-97008f1b23c8
# ╠═55afd5f5-504c-4574-bd03-58fa9a5e09cc
# ╠═5feaede9-0bd3-4e66-9063-ce828eaf6f7e
# ╠═59a70fac-01e1-429f-81bf-06ab74b1c3a7
# ╠═38b541da-e906-4af4-8736-42e3c65e79cf
# ╠═cd74031b-ebd1-4e18-89d4-e8f68f101a89
# ╠═4b4f9102-ea56-4c92-9655-34771e32a591
# ╠═8494a376-222f-4ca7-abe4-233ec037b3ba
# ╠═e47c9391-0e27-4c54-8c3a-2d58f2935759
# ╠═0234b949-3520-4853-9018-5f3b96c803fd
# ╠═bfb90258-de91-4955-adcf-7c93f79fdefe
# ╠═c5aa54b7-e9bd-45c6-9673-44e9bb215f9a
# ╠═526d04e1-e3d0-4b06-b3b2-d55e2fc73bec
# ╠═e4116e71-d638-424a-a319-0b8cd084bade
# ╠═ab015e88-631d-4cfe-bf09-400bedb25f20
# ╠═d4ab28fb-9124-420d-8f99-3b099c73f6ed
# ╠═d80b1f58-8dd0-4a87-a6a2-3038e4f3da2a
# ╠═867a6b3a-9a6d-4c94-94d9-5a63a642b028
# ╠═f20c670a-9f86-4def-97e1-79e8a2f63bf6
# ╠═3be2e06f-e005-487d-921e-d2d56702599a
# ╠═6741b47d-4513-4dcb-aec8-186c03c24ccc
# ╟─00000000-0000-0000-0000-000000000001
# ╟─00000000-0000-0000-0000-000000000002

As a side note, in the definition of s(), if I replace the nested loops with the equivalent

m = a - 0.1*sum(a,dims=2) * sum(a,dims=1)

I get an earlier error:

The issue there it looks like you’re using the deprecated SparsityDetection instead of Symbolics, is that on purpose?

Not on purpose. I literally just copy pasted the examples from the README.md in GitHub - JuliaDiff/SparseDiffTools.jl: Fast jacobian computation through sparsity exploitation and matrix coloring to get something quick going to see if I could speed up some experiments. This is why I was asking for guidance and examples and I tagged the question also as “documentation”.

OK. Using sparsity detection from Symbolic is faster than the one from the deprecated SparsityDetection for the original nested loops s() definition. With Symbolics the error goes away if I use

m = a - 0.1*sum(a,dims=2) * sum(a,dims=1)

instead of the nested loops, but then it just returns an empty (N1N2)x(N1N2) sparse metric

maybe this is expected? Non sure.

In any case, nested loops are fine if needed, my main interest is still to use the sparsity pattern to speed up the auto-diff and use the sparse autodiff Jacobian in a linear system to generate a Gauss-Newton step as described in the original question. A small snippet extracting the Jacobian J and using it in a w = J\v operation (that takes advantage of sparsity) would super helpful as a reference.

Try the latest NonconvexUtils.jl

using NonconvexUtils, Zygote, ForwardDiff, LinearAlgebra

# Pre-process

m(A) = A - sum(A, dims=2) * sum(A, dims=1)
x = rand(5, 5)
sparse_m = sparsify(m, x)

# Calculate the Jacobian

x = rand(5, 5)
Zygote.pullback(sparse_m, x) # triggers the computation of the Jacobian
J = sparse_m.flat_f.jac

# Test

norm(ForwardDiff.jacobian(x -> vec(m(reshape(x, 5, 5))), vec(x)) - J)

Thank you for the suggestion @mohamed82008 . It looks like the Zygote jacobian extraction is orders of magnitude slower than the non sparse ForwardDiff, I’m not sure if that difference can be recovered by a sparse linear solve. I’m thinking about 60x60 matrices, and so 60x60x60x60 jacobians. That’s why I was curious about

forwarddiff_color_jacobian!(autoJac, s, dv, colorvec = colors)

PS I’m assuming this is intrinsic, I don’t see any cashing related improvements if I rerun the Zygote jacobian extraction multiple times, unless I’m missing something obvious.

How sparse is your Jacobian matrix? It’s possible my implementation + Zygote are adding some overhead. If the same slowdown persists for different larger matrix sizes, then it’s an intrinsic slowdown because I don’t think the overhead scales with the matrix size. If the slowdown disappears for larger matrices, then we can’t conclude if it’s intrinsic or not for smaller matrices.