How to optimize ForwardDiff.jacobian for large system of equations?


I am trying to use ForwardDiff to compute the jacobian of a system of equations that I anticipate will be very large. Below is a very simplified working code that illustrate what I am trying to do.

When I run the code below I get the following as output from @time:

20.089338 seconds (128.02 k allocations: 46.572 GiB, 4.09% gc time)

Since I am new in Julia, it is likely that I am doing something wrong. Could someone help with the following questions:

  1. Would you expect that ForwardDiff.jacobian takes 20s to for the case of 50K equations ?

  2. Could you let me know if there is anything that I could do to optimize the code below ?

Thanks in advance for the help.

using ForwardDiff
using ForwardDiff: Chunk

function ResidualEquations( X::AbstractArray )

p0 = rand(50000)

f(x) = ResidualEquations(x)

println("\n \t ... Creating Config ... ")
jconfig = ForwardDiff.JacobianConfig( f , p0, Chunk{2}())

println("\n \t ... Evaluating jacobian ... ")
@time jac = ForwardDiff.jacobian( f, p0, jconfig )::Array{Float64,2}

Is your Jacobian sparse? If yes, you should try


Thanks for reply. I think my Jacobian is sparse, but it is not clear from your answer how using SparseDiffTools could help ? Could you please provide an example ? Thanks in advance !

Use to get the sparsity pattern, and then use for matrix coloring to get a color vector, and use SparseDiffTools.jl again for the forward color AD. See the SparseDiffTools.jl README for an example.


I’m not sure I can help but I’ll chime in.

You should definitely figure out if it is sparse. Are most \frac{\partial f_i(x)}{\partial x_j} entries of the Jacobian equal to zero? If yes, then you should think about better strategies than just calculating all the entries, which I think is what ForwardDiff does in your example (which is not very smart because in your example the Jacobian is simply J(x) = I).

If you know it is sparse, I would also suggest decomposing your function into simpler pieces if possible, and think about what the Jacobian of each piece is. For example,

  • if f(x) = M * x, then the Jacobian is J(x) = M so you should already have it,
  • if f is nonlinear but local, i.e., f(x) = g.(x) for some nonlinear scalar function g, then J(x) = Diagonal(Dg.(x)) where Dg(x) = ForwardDiff(g, x), which should cost you the equivalent of a single call to f,
  • if f is something like f(x) = M * g.(x), then J(x) = M * Diagonal(Dg.(x)) should also cost you the equivalent of 1 call to f,
  • and so on…

In any case if f is sparse, then you should follow @matthieu’s advice and use SparseDiffTools to at least reduce the number of calls to f. I believe there is enough info on the ReadMe to get you started there.

1 Like