ForwardDiff with Matrix of Vector as input argument

gdalle · October 9, 2024, 9:47pm

Also note that my package DifferentiationInterface.jl is meant to make your life easier in such situations, by hiding away the details of preallocation and performance optimization. Take a look at the tutorial if you’re interested.

Domenico_Lahaye · October 11, 2024, 2:21pm

First tests with foo!() instead of foo() fail. Resulting gradient is a zero matrix.

Should ForwardDiff.jacobian() be replaced by ForwardDiff.jacobian!() as well?

gdalle · October 11, 2024, 3:21pm

Can you provide an MWE?

Domenico_Lahaye · October 11, 2024, 5:32pm

Here is an attempt

function foo!(y,x)
    y = sum(x)*ones(length(x))
    return y
end 

x1 = ones(10); y1 = zeros(length(x1))

y1 = foo!(y1,x1) 

ForwardDiff.jacobian(foo!, y1, x1)

yields as output

10×10 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

gdalle · October 11, 2024, 5:43pm

The issue here is that you’re not modifying the argument y passed to the function in-place. You’re creating a new object (which happens to have the same name) which you fill with the desired output. What you need to do is mutate the very y that the function receives.
A simple fix is the following:

function foo!(y,x)
    # notice the dot to denote elementwise assignment
    y .= sum(x)*ones(length(x))
    return y
end

but a more clever one would be

function foo!(y, x)
    y .= sum(x)
end

You may want to take a look at the docs page on performance tips to improve such in-place functions and maybe avoid allocations altogether

Domenico_Lahaye · October 11, 2024, 7:21pm

Thx again.

Now obtaining correct Jacobian with in-place functions.

Improvements in terms of memory allocations remain only marginal.

Version 1 - Using Jacobian (no pound)

function foo!(y,x)
     # need to mutate the very y that the function receives
     # notice the dot to denote elementwise assignment
    y .= sum(x)*ones(length(x))
    return y
end

x1 = ones(10); y1 = zeros(length(x1))
y1 = foo!(y1,x1) 
cfg = ForwardDiff.JacobianConfig(foo!,y1,x1)
@btime ForwardDiff.jacobian(foo!, y1, x1, cfg)

resulting in

528.361 ns (4 allocations: 2.88 KiB)

Version 2 - Using Jacobian! (with pound)

x1 = ones(10); y1 = zeros(length(x1))
y1 = foo!(y1,x1) 
cfg = ForwardDiff.JacobianConfig(foo!,y1,x1)
@btime ForwardDiff.jacobian!(DiffResults.JacobianResult(y1,x1),foo!,y1,x1,cfg)

resulting in

  600.000 ns (6 allocations: 3.05 KiB)

More soon.

bertschi · October 11, 2024, 7:35pm

Domenico_Lahaye:

Improvements in terms of memory allocations remain only marginal.

function foo!(y,x)
     # need to mutate the very y that the function receives
     # notice the dot to denote elementwise assignment
    y .= sum(x)*ones(length(x))
    return y
end

The call to ones allocates a new array. It is unnecessary here, as broadcasting the assignment takes care of writing into all of y anyways:

function baz!(y, x)
    y .= sum(x)
end

# Better to interpolate global variables when benchmarking to avoid spurious allocations
julia> @btime foo!($y1,$x1);
  143.241 ns (2 allocations: 288 bytes)

julia> @btime baz!($y1,$x1);
  16.323 ns (0 allocations: 0 bytes)

Domenico_Lahaye · October 11, 2024, 7:51pm

Thx!

That drastically reduces the number of allocations (hurray!).

The CPU time, however, remains roughly equal to earlier more naive implementations (see Post # 18 above).

function foo!(y,x)
     # need to mutate the very y that the function receives
     # notice the dot to denote elementwise assignment
    y .= sum(x)
    return y
end

x1 = ones(10); y1 = zeros(length(x1))
y1 = foo!(y1,x1) 
cfg = ForwardDiff.JacobianConfig(foo!,y1,x1)
@btime ForwardDiff.jacobian(foo!, $y1, $x1, cfg)

432.161 ns (2 allocations: 1.75 KiB)

x1 = ones(100); y1 = zeros(length(x1))
y1 = foo!(y1,x1) 
cfg = ForwardDiff.JacobianConfig(foo!,y1,x1)
@btime ForwardDiff.jacobian(foo!, $y1, $x1, cfg)

 24.792 μs (5 allocations: 80.30 KiB)

x1 = ones(1000); y1 = zeros(length(x1))
y1 = foo!(y1,x1) 
cfg = ForwardDiff.JacobianConfig(foo!,y1,x1)
@btime ForwardDiff.jacobian(foo!, $y1, $x1, cfg)

2.425 ms (3 allocations: 7.63 MiB)

gdalle · October 11, 2024, 9:52pm

Try it again while interpolating (putting a $ on) the cfg object too, since it is also a global variable

Domenico_Lahaye · October 12, 2024, 5:32am

Hardly changes

287.826 ns (1 allocation: 896 bytes) # problem size 10 
24.666 μs (4 allocations: 79.08 KiB) # problem size 100 
2.399 ms (2 allocations: 7.63 MiB) # problem size 1000

gdalle · October 12, 2024, 10:58am

For optimal performance, you also want to pre-allocate the output matrix:

using ForwardDiff, BenchmarkTools

foo!(y, x) = y .= sum(x)

function benchmark_trivial_jacobian(n)
    x = ones(n)
    y = similar(x)
    J = similar(y, length(y), length(x))
    cfg = ForwardDiff.JacobianConfig(foo!, y, x)
    @btime ForwardDiff.jacobian(foo!, $y, $x, $cfg)
    @btime ForwardDiff.jacobian!($J, foo!, $y, $x, $cfg)
    return nothing
end

But it doesn’t change much for larger problems, except for memory use:

julia> benchmark_trivial_jacobian(10)
  370.061 ns (1 allocation: 896 bytes)
  304.813 ns (0 allocations: 0 bytes)

julia> benchmark_trivial_jacobian(100)
  27.706 μs (19 allocations: 89.31 KiB)
  26.636 μs (17 allocations: 11.14 KiB)

julia> benchmark_trivial_jacobian(1000)
  3.515 ms (169 allocations: 7.73 MiB)
  3.423 ms (167 allocations: 104.89 KiB)

Domenico_Lahaye · October 13, 2024, 6:43am

Thanks. I much appreciate all your input.

I would like to better understand why the version with all the bells and whistles (buffer on input, preallocation of output) is not faster than the more naive implementation.

Stated differently, I wonder why the reduction of number of allocations does not results in a reduction of CPU time.

Is this due to the Jacobian being dense? Is looking into a MWE with a sparse Jacobian worthwhile here?

gdalle · October 13, 2024, 6:55am

At some point you’re limited by

the time it takes your function to run
the overhead caused by autodiff

If you know for a fact that your Jacobian is sparse, then yes you can accelerate it significantly by making use of that knowledge. See the DifferentiationInterface sparse tutorial for an example.

gdalle · October 13, 2024, 7:04am

It may also be worth pondering whether you need a full Jacobian at all. For instance, if you compute J only to take its products with some vector, there are faster ways to achieve this

Domenico_Lahaye · October 13, 2024, 7:00pm

Thanks again.

The Jacobian in the application I target is dense. The application requires to solve a linear system with the Jacobian.

Domenico_Lahaye · October 16, 2024, 8:35pm

Does using ForwardDiff on a function that calls hcubature with buffer as argument cause particular challenges?

Without buffer as argument things works fine.

With buffer I get a type conversion error.

gdalle · October 17, 2024, 6:24am

You’re gonna need PreallocationTools.jl to automatically generate buffers that have the right type

Domenico_Lahaye · October 17, 2024, 6:52am

Aha! Will check it out. Thx again.

Topic		Replies	Views
Using ForwardDiff.jl with BlockDiagonals.jl General Usage	1	131	April 2, 2024
[ForwardDiff Question]: Jacobian of a function that calls another function General Usage question , forwarddiff	6	846	July 20, 2020
Jacobian of a (multivariate) function New to Julia	10	7465	May 9, 2020
Function structure for ForwardDiff jacobian General Usage forwarddiff	0	157	February 3, 2023
ForwarDiff of a function depending on a mask vector New to Julia forwarddiff	12	619	June 9, 2021

ForwardDiff with Matrix of Vector as input argument

Related topics