Sparse jacobians of matrix models

gianmariomanca · June 27, 2022, 9:41pm

You can see it from your script
Just put 60x50 matrices and you will see that you get:

3000×3000 SparseArrays.SparseMatrixCSC{Float64, Int64} with 327000 stored entries:

so about 4 out of 100 entries are different from zero. The zygote call takes many seconds, but the norm computation in your #Test is very fast, because ForwardDiff is much faster.

mohamed82008 · June 27, 2022, 9:49pm

More important than the sparsity of the matrix is the sparsity pattern. It could be that the sparsity pattern in this case does not enable SparseDiffTools to be faster than ForwardDiff. This is the pattern I am seeing:

⣿⣿⣿⣿⣿⣾⣾⣾⣾⣮⡻⣿⣿⣷⣷⣷⣷⣷⣝⢿⣿⣿⣾⣾⣾⣾⣮⡻⣻⣿⣷⣷⣷⣷⣷⣕⣝⢿⣿⣿⣾⣾⣾⣾⣮⡻⣿⣿⣷⣷⣷⣷⣷⠄
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣾⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡫⣻⣿⣿⣿⣿⣿⣷⣵⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⠅
⣻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡫⣿⣿⣿⣿⣿⣿⣷⣵⢝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⡅
⣺⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡪⡿⣿⣿⣿⣿⣿⣷⣷⢝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⡇
⡺⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡺⡿⣿⣿⣿⣿⣿⣿⣗⢝⢿⣿⣿⣿⣿⣿⣿⣮⡃
⣿⣮⡺⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡺⡻⣿⣿⣿⣿⣿⣿⣗⣝⢿⣿⣿⣿⣿⣿⡂
⢿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣻⣿⣿⣿⣿⣿⣿⣕⣝⢿⣿⣿⣿⡂
⢽⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡫⣻⣿⣿⣿⣿⣿⣷⣵⣝⢿⣿⡇
⢽⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡪⣿⣿⣿⣿⣿⣿⣷⣵⢝⠇
⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡪⡿⣿⣿⣿⣿⣿⣷⠅
⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡺⡿⣿⣿⣿⣿⠅
⣺⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣵⣿⣿⣿⣿⣿⣿⣿⣯⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⡻⣿⣿⡅
⣺⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣻⡇
⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡃
⣿⣾⡮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⡂
⢽⣿⣿⣾⡮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡫⣿⣿⣿⣿⣿⣿⣿⣟⣽⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⡂
⢽⣿⣿⣿⣿⣿⡪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⡇
⢝⢿⣿⣿⣿⣿⣿⣯⣪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⠇
⣷⣝⢝⣿⣿⣿⣿⣿⣿⣯⣪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⠅
⣿⣿⣷⣝⢝⣿⣿⣿⣿⣿⣿⣮⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⠅
⣺⣿⣿⣿⣷⣕⢽⣿⣿⣿⣿⣿⣿⣾⡮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⡇
⣺⣿⣿⣿⣿⣿⣷⣕⢿⢿⣿⣿⣿⣿⣿⣾⡪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣪⡻⡇
⣮⡻⣿⣿⣿⣿⣿⣿⣷⣕⢿⢿⣿⣿⣿⣿⣿⣿⡪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡂
⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢟⢿⣿⣿⣿⣿⣿⣯⣪⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡂
⢽⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢝⣿⣿⣿⣿⣿⣿⣯⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡆
⢽⣿⣿⣿⣿⣿⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢝⣿⣿⣿⣿⣿⣿⣮⣮⡻⣿⣿⣿⣿⣿⣿⣷⣝⢿⣿⣿⣿⣿⣿⣿⣮⡻⣻⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇
⠙⠟⠟⠟⠟⠿⠿⠿⠮⠻⠻⠻⠻⠻⠿⠿⠷⠕⠝⠟⠟⠟⠟⠿⠿⠾⠮⠻⠻⠻⠻⠻⠿⠿⠷⠝⠟⠟⠟⠟⠿⠿⠿⠮⠻⠻⠻⠻⠻⠿⠿⠿⠿⠇

You might be already familiar with the theory but here is a very brief intro of it Sparse Jacobian or Hessian · Nonconvex.jl. There was also a lecture by Chris Rackauckas on YouTube about the same thing.

odow · June 27, 2022, 10:07pm

to speed up an optimization problem

What’s the full formulation you’re trying to achieve?

Did you consider using JuMP?

gianmariomanca · June 27, 2022, 10:17pm

@mohamed82008 that’s a plotting artifact. Here is the sparsity pattern:

mohamed82008 · June 28, 2022, 12:16am

The colours vector sparse_m.flat_f.jac_colors shows that the SparseDiffTools colouring algorithm couldn’t find any better splitting than to make each variable have its own colour. sparse_m.flat_f.jac_colors == 1:3000 is true. This is the worst case scenario for SparseDiffTools.

acxz · June 28, 2022, 1:22am

This info may not be that relevant and might also be common knowledge amongst the members here, but I’d like to point out that if you are targeting speed for sparse matrices do take a look at renumbering methods before doing solves. Something like running METIS’s nested dissection or SuiteSparse’s COLAMD/SYMAMD can go a long ways. (Maybe this is handled internally by the aforementioned packages and if not then I would argue/recommend that renumbering methods be incorporated) This may also only be applicable for solves/reducing fill-in. For reducing bandwidth/profile Cuthill-Mckee is used.

I don’t know if computing the jacobian itself for renumbered matrices would be faster or not. Just thought I’d chime in with something that could be helpful if the structure of the sparsity matters in the aforementioned tools.

gianmariomanca · June 30, 2022, 11:44pm

Thank you @odow , I didn’t consider Jump yet because I prefer reconstructing the optimization routine from scratch if I can. I tend to use Julia more to solve the “1st high level language is too slow” than the “2 languages problem”. I might in the future, the timescale of my quick experiment was too short to dig deep into something like Jump (I might be wrong).

gianmariomanca · June 30, 2022, 11:48pm

Thank you @mohamed82008 . So we are saying that we should not expect to be able to extract a lot of extra speed-up using sparsity in this case, or it’s still worth double checking if one could run

forwarddiff_color_jacobian!(autoJac, s, dv, colorvec = colors)

without errors?

mohamed82008 · June 30, 2022, 11:53pm

I call that function internally to compute the jacobian so it’s not needed on its own unless you suspect the correctness of the implementation which you are more than welcome to verify. I would be more inclined to try different colouring algorithms from SparseDiffTools though beside the column re-numbering or permutation described by @acxz but that should probably be part of the colour finding algorithm in SparseDiffTools if it’s not already there. Feel free to explore the SparseDiffTools implementation of colouring algorithms and possibly even improving them. The goal is to have as few colours as possible.

gianmariomanca · July 1, 2022, 12:29am

On a shorter timescale, any suggestions on matching JAX performance on the the same Jacobian computation?

As a reference point these are the results with JAX (CPU and GPU):

CPU

In [1]: import jax.numpy as jnp
   ...: from jax import grad, jit, vmap
   ...: from jax import random
   ...: from jax import jacfwd, jacrev
   ...:
   ...: key = random.PRNGKey(0)
   ...: a = random.normal(key,(60,60))
   ...: b = jnp.ones_like(a)
   ...: def m(a):
   ...:   return a - jnp.sum(a, 1, keepdims=True) @ jnp.sum(a, 0, keepdims=True)
   ...: jb = jit(jacrev(m))
   ...: jm = jit(jacfwd(m))
   ...: %timeit jb(a)
   ...: %timeit jm(a)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
15.7 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
36 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And GPU

In [3]: import jax.numpy as jnp
   ...: from jax import grad, jit, vmap
   ...: from jax import random
   ...: from jax import jacfwd, jacrev
   ...:
   ...: key = random.PRNGKey(0)
   ...: a = random.normal(key,(60,60))
   ...: b = jnp.ones_like(a)
   ...: def m(a):
   ...:   return a - jnp.sum(a, 1, keepdims=True) @ jnp.sum(a, 0, keepdims=True)
   ...: jb = jit(jacrev(m))
   ...: jm = jit(jacfwd(m))
   ...: %timeit jb(a)
   ...: %timeit jm(a)
287 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
566 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Forward diff is around 400ms on my machine, still 20 times slower than the JAX CPU version. It might be that JAX is using multi-threading automatically, but it would be nice to get close.

gianmariomanca · July 1, 2022, 1:24am

32bits Floats (I remembered it was less than 400ms, but still in the 200ms range)

gianmariomanca · July 1, 2022, 3:24am

PS Feel free to split the performance comparison with JAX into another thread if appropriate.

ChrisRackauckas · July 1, 2022, 9:51am

Yes, so @gianmariomanca this doesn’t have a sparsity pattern that sparse automatic differentiation can reduce. Sparse symbolic differentiation would be able to handle it well, so if you want to try using Symbolics.jl to generate it, that would be as fast as all hell.

It looks like the SparseDiffTools.jl README is just old. I’ll get that updated.

gianmariomanca:

begin
	scall = 0
	function s(m,b)
		a = reshape(b,N1,N2)
		global scall += 1
		for i = 1:size(a)[1]
			for j = 1:size(a)[2]
				m[i,j] = a[i,j] - 0.1*sum(a[i,:])*sum(a[:,j])
			end
		end
	end
	function reset_scall()
		global scall = 0
    end
end

# ╔═╡ 38b541da-e906-4af4-8736-42e3c65e79cf
d = rand(N1,N2)

Using a global like that will slow codes down by more than 20x. Don’t use a global if you don’t need it. Also, for codes that mutate like that, Enzyme will be a much faster AD than Zygote (I’m surprised Zygote even worked?)

gianmariomanca:

I’m not sure if that difference can be recovered by a sparse linear solve. I’m thinking about 60x60 matrices, and so 60x60x60x60 jacobians. That’s why I was curious about
forwarddiff_color_jacobian!(autoJac, s, dv, colorvec = colors)

Note that only effects the cost of constructing the sparse Jacobian, and it only effects it if maximum(colors) is sufficiently smaller than length(s). As @mohamed82008 mentioned, maximum(colors) == length(s) in your case, so no sparsity simplification occurs here.

But more directly, @gianmariomanca you should post a profile of your code. I recommend using

to share an interactive flamegraph. The real question is, what is taking all of the time during the Jacobian construction?

gianmariomanca · July 2, 2022, 12:51am

Thank you @ChrisRackauckas , I’ll definitely check the Symbolics approach at some point in the future, I’ve heard good things Even though it might be not be trivial in this particular case.

Just to avoid misunderstanding: I agree about globals, but for clarity I didn’t benchmark or use the version of the function with the global variable dependence, just the basic function:

m(A) = A - sum(A, dims=2) * sum(A, dims=1)

I’ll try to share the profiling, but I’m not sure how, since attachments are not allowed and one needs to share the html or some other interactive format for the report to be useful.

mohamed82008 · July 2, 2022, 1:43am

Try

using NonconvexUtils, LinearAlgebra

# Preprocess

m(A) = A - sum(A, dims=2) * sum(A, dims=1)
x = rand(5, 5)
sym_m = symbolify(m, x, sparse = true)

# Compute the jacobian

x = rand(5, 5)
J = sym_m.flat_f.g(vec(x))

which uses Symbolics under the hood.

gianmariomanca · July 2, 2022, 6:56pm

julia> sym_m = symbolify(m, x, sparse = true)
ERROR: UndefVarError: symbolify not defined

mohamed82008 · July 3, 2022, 1:36am

Make sure you are on the latest NonconvexUtils and that no other loaded package exports symbolify. If 2 packages export the same name, you need to qualify the name by the package name e.g. NonconvexUtils.symbolify to tell Julia which symbolify you are talking about.

gianmariomanca · July 3, 2022, 6:16pm

Thank you. It does run on REPL with the latest packages. Seems to be doing fine for moderate sizes ~15x15, but I could not really run anything above ~20x20, which is quite far from ~60x60.

Topic		Replies	Views
Module idea for helping calculate large sparse Jacobians Optimization (Mathematical)	5	530	August 25, 2019
How to deal with sparse Jacobians? Optimization (Mathematical)	13	3693	January 13, 2019
Correct use of the sparsity pattern in NLsolve.jl for Diagonal Jacobians Optimization (Mathematical) nlsolve	6	979	July 24, 2020
How to optimize ForwardDiff.jacobian for large system of equations? General Usage	4	1244	August 15, 2019
Sparsity Detection in Jacobian Performance question , diffeq , sparse , symbolics	1	504	February 4, 2022

Sparse jacobians of matrix models

Related topics