ForwardDiff with Matrix of Vector as input argument

Greetings. Consider the small function

function foo(v)
    return [v; v] 
end 

Then the following works

v0 = [1;1]
out = ForwardDiff.jacobian(v -> foo(v), v0)

The following works as well

A0 = [1 2; 3 4]
out = ForwardDiff.jacobian(A -> foo(A), A0)type or paste code here

The following, however, returns in a method error

T0 = [ [1.,1.] for i=1:2, j=1:2]
out = ForwardDiff.jacobian(T -> foo(T), T0)

How can I extend the method definitions to treat the latter case?

Thx!

Hi @Domenico_Lahaye! ForwardDiff expects the input and output of your function to be an array of numbers. You could use stack to make sure of that, but you have to be careful with the shapes and what they mean.

Clear, thx.

Essentially, i am trying to solve a vector-valued PDE. I wish to form a Jacobian in which the elements are 2-by-2 block matrices.

I will check for far I get by stacking the input for forwardDiff and un-stacking its output.

More here later. Many thanks for now.

1 Like

An alternative is ComponentArrays.jl, but honestly I would just recommend stacking everything into a 3-dimensional tensor. Note that the jacobian returned by ForwardDiff will always be a 2d matrix though, it essentially flattens the input and output into vectors.

Or merely do

T01 = [ 1 1; 1 1]
T02 = [ 1 1; 1 1]
out1 = ForwardDiff.jacobian(T -> foo(T), T01)
out2 = ForwardDiff.jacobian(T -> foo(T), T02)
[out1 out2]

Thank you so much!

You can also just call

ForwardDiff.jacobian(foo, T01)

True. This, however, is no longer true in the application that I target as the function foo() has more than one input argument.

1 Like

Final remarks: if performance is a bottleneck for you, you may want to look into:

  • StaticArrays.jl if your input is small
  • preallocation of config

I do hate to disappoint you. Developments are too premature for remarks to final :grinning:

1/ The splitting T0 into T01 and T02 does away with StaticArrays.jl in the original approach. There might be a bring StaticArray.jl back in;

2/ Yes, definitely, I do need to look into config.

Note that when you use SArrays you don’t need configuration, jacobian as is works best.
Maybe you don’t need to split and can create a single SArray of dimension 3?

How do you make StaticArrays work with dual numbers?

They should work out of the box. Do you have a MWE?

Good to know that it should work out of the box.

No MWE directly available. Will make one in case troubles persist.

Thx!

Good news: I now succeed in differentiating the entire code!

Now debugging and optimization can start.

1 Like

I discovered that

function foo(x,y)
    SVector{3,Float64}(x,y,0) # observe type annotation 
end 
x0 = 1.
y0 = 2. 
ForwardDiff.derivative(x -> foo(x,y0),x0) 

results is a method errror and that

function foo(x,y)
    SVector{3}(x,y,0) # observe absence of type annotation 
end 
x0 = 1.
y0 = 2. 
ForwardDiff.derivative(x -> foo(x,y0),x0) 

works.

Is this known / generally accepted?

Did I misread the documentation?

Yes it is known because your first function coerces the inputs into a Float64 type, whereas ForwardDiff needs more generality to function, see Limitations of ForwardDiff · ForwardDiff. You need to allow arbitrary real types to flow through your function, including any containers you generate.

1 Like

Thx!

Can you please point me to examples using ForwardDiff.JacobianConfig() ?

I don’t have any in mind apart from the API reference, but if you’re working solely with SArrays then configuration should be unnecessary. Do you have a complete MWE that I could maybe help you optimize?

Thx.

StaticArrays are used in lower levels of the code only.

Here is a MWE.

Given two functions with deliberately dense Jacobians

function foo(x)
    return sum(x)*ones(length(x))
end 

function gnu(x)
    return sum(x.*x)*ones(length(x))
end

and given three input vectors of deliberately larger dimension

x1 = ones(10)
x2 = ones(100)
x3 = ones(1000)

the following two questions arrise:

Q1: How do avoid increasing number of allocations with increasing problem size?

Here is what I currently see:

@btime ForwardDiff.jacobian(x -> foo(x), x1);
840.726 ns (6 allocations: 4.58 KiB)

@btime ForwardDiff.jacobian(x -> foo(x), x2);
26.125 μs (23 allocations: 191.45 KiB)

@btime ForwardDiff.jacobian(x -> foo(x), x3);
  2.445 ms (258 allocations: 16.72 MiB)

Q2: How do reuse buffer crease for Jacobian for the function foo() in order to compute the Jacobian for the function gnu()?

Is my understand correct that the buffer can be shared between the function foo() and gnu()?

Thx!

You can’t, because your functions both allocate new vectors. ForwardDiff computes Jacobians by running the function once per input dimension[1], so the number of allocations will necessarily scale with the input.

Still, you can greatly increase efficiency by:

  1. Writing your functions in-place: f!(y, x) instead of y = f(x)
  2. Pre-allocating a Jacobian matrix J
  3. Pre-allocating a JacobianConfig and pass it to jacobian! along with J

The idea is that steps 2 and 3 only need to be done once, and then you can compute many derivatives reusing the same memory.

You can’t, because ForwardDiff uses a custom tag for each function to identify where the Dual numbers came from. You can disable it but I would advise against it.


  1. Not entirely true, the chunking mechanism means you can actually divide the input dimension by the chunk size. ↩︎

1 Like