I am trying to use ForwardDiff to compute the jacobian of a system of equations that I anticipate will be very large. Below is a very simplified working code that illustrate what I am trying to do.

When I run the code below I get the following as output from @time:

20.089338 seconds (128.02 k allocations: 46.572 GiB, 4.09% gc time)

Since I am new in Julia, it is likely that I am doing something wrong. Could someone help with the following questions:

Would you expect that ForwardDiff.jacobian takes 20s to for the case of 50K equations ?

Could you let me know if there is anything that I could do to optimize the code below ?

Thanks in advance for the help.

using ForwardDiff
using ForwardDiff: Chunk
function ResidualEquations( X::AbstractArray )
1*X
end
p0 = rand(50000)
f(x) = ResidualEquations(x)
println("\n \t ... Creating Config ... ")
jconfig = ForwardDiff.JacobianConfig( f , p0, Chunk{2}())
println("\n \t ... Evaluating jacobian ... ")
@time jac = ForwardDiff.jacobian( f, p0, jconfig )::Array{Float64,2}

Thanks for reply. I think my Jacobian is sparse, but it is not clear from your answer how using SparseDiffTools could help ? Could you please provide an example ? Thanks in advance !

You should definitely figure out if it is sparse. Are most \frac{\partial f_i(x)}{\partial x_j} entries of the Jacobian equal to zero? If yes, then you should think about better strategies than just calculating all the entries, which I think is what ForwardDiff does in your example (which is not very smart because in your example the Jacobian is simply J(x) = I).

If you know it is sparse, I would also suggest decomposing your function into simpler pieces if possible, and think about what the Jacobian of each piece is. For example,

if f(x) = M * x, then the Jacobian is J(x) = M so you should already have it,

if f is nonlinear but local, i.e., f(x) = g.(x) for some nonlinear scalar function g, then J(x) = Diagonal(Dg.(x)) where Dg(x) = ForwardDiff(g, x), which should cost you the equivalent of a single call to f,

if f is something like f(x) = M * g.(x), then J(x) = M * Diagonal(Dg.(x)) should also cost you the equivalent of 1 call to f,

and so on…

In any case if f is sparse, then you should follow @matthieu’s advice and use SparseDiffTools to at least reduce the number of calls to f. I believe there is enough info on the ReadMe to get you started there.