ForwardDiff and GradientConfig memory usage


I noticed the following issue when trying to minimize my memory allocation while using ForwardDiff.gradient! . My example looks like:

using ForwardDiff
function run0(x, f, n)
       out = similar(x);
       cfg = ForwardDiff.GradientConfig(f, x);
       for i = 1:n
              ForwardDiff.gradient!(out, f, x, cfg);

function V(x)

       result = (dot(x,x)-1)^2;

       return result

If I now compute:

julia> x100 = rand(100);
julia> @time run0(x100, V, 10);
  0.586337 seconds (350.17 k allocations: 19.781 MiB, 1.96% gc time)
julia> @time run0(x100, V, 10);
  0.000659 seconds (7 allocations: 10.656 KiB)

julia> @time run0(x100, V, 100);
  0.005968 seconds (7 allocations: 10.656 KiB)

julia> @time run0(x100, V, 1000);
  0.026512 seconds (7 allocations: 10.656 KiB)

and you see that the memory usage is insensitive to the number of iterations.

If, however, I use a small vector,

julia> x5=rand(5);

julia> @time run0(x5, V, 10);
  0.116469 seconds (68.13 k allocations: 3.728 MiB)

julia> @time run0(x5, V, 10);
  0.000157 seconds (17 allocations: 1.453 KiB)

julia> @time run0(x5, V, 20);
  0.000167 seconds (27 allocations: 2.078 KiB)

julia> @time run0(x5, V, 100);
  0.000385 seconds (107 allocations: 7.078 KiB)

julia> @time run0(x5, V, 1000);
  0.002894 seconds (1.01 k allocations: 63.328 KiB)

I see growth in memory usage with the number of iterations.


bumping this - would love to know as well what is going on here.


Ok, so funnily enough this is exactly the problem we discussed before regarding not parameterizing on function arguments. removes the allocations (and makes it a bit faster).

julia> x5=rand(5);

julia> @time run0(x5, V, 10);
  0.000094 seconds (19 allocations: 1.703 KiB)

julia> @time run0(x5, V, 10);
  0.000047 seconds (7 allocations: 848 bytes)

julia> @time run0(x5, V, 20);
  0.000039 seconds (7 allocations: 848 bytes)

julia> @time run0(x5, V, 100);
  0.000104 seconds (7 allocations: 848 bytes)


So what do I need to do (I’m still a bit of a novice)? I tried Pkg.checkout("ForwardDiff") and then running with that, but I’m still having the same issues with my code.


You need to check out the name of that branch since it isn’t merged. Pkg.checkout("ForwardDiff", "kc/f_spec") probably.


Got it, thanks.


I’m noticing (possibly) related issues with the following function:

function V(x)
       result = 0.0;
       for i in 1:length(x)
              result+= (x[i]^2-1)^2
       return result

The setup is otherwise the same, though I am now using your suggested pull. But using this V(x), I get linear growth in allocations. However, if I swap result=0.0 for result=zero(eltype(x)), everything is well behaved again. Is this just an example of a need for explicit typing, or is there something else at work here?


Yes, this is just a classic type-instability, because the eltype of x is a ForwardDiff.Dual, so in your code result changes from a Float64 to a ForwardDiff.Dual. It’s the same situation as but with a different set of types.


I am a bit surprised that 0.0 is inadequate for ensuring that it was interpreted as a floating point number. Regardless, is the use of eltype in this example the stylistically favored solution?


0.0 is a Float64. That is different from a Dual number. Using eltype works well or

function V(x::Vector{T}) where {T}
    result = zero(T)


0.0 is sufficient to ensure that result starts out as a Float64. The problem is that when you use ForwardDiff, your function is called not with a Float64 argument but with a special ForwardDiff.Dual number. Try printing eltype(x) inside your function when you compute its gradient to see that.

That’s why the recommendation is to use zero(eltype(x)) which will just do the right thing (Float64 for Float64, Dual for Dual, etc.) at no additional cost.