ForwardDiff and GradientConfig memory usage

gideonsimpson · April 3, 2018, 11:55pm

I noticed the following issue when trying to minimize my memory allocation while using ForwardDiff.gradient! . My example looks like:

using ForwardDiff
function run0(x, f, n)
       out = similar(x);
       cfg = ForwardDiff.GradientConfig(f, x);
       for i = 1:n
              ForwardDiff.gradient!(out, f, x, cfg);
       end
end

function V(x)

       result = (dot(x,x)-1)^2;

       return result
end

If I now compute:

julia> x100 = rand(100);
julia> @time run0(x100, V, 10);
  0.586337 seconds (350.17 k allocations: 19.781 MiB, 1.96% gc time)
julia> @time run0(x100, V, 10);
  0.000659 seconds (7 allocations: 10.656 KiB)

julia> @time run0(x100, V, 100);
  0.005968 seconds (7 allocations: 10.656 KiB)

julia> @time run0(x100, V, 1000);
  0.026512 seconds (7 allocations: 10.656 KiB)

and you see that the memory usage is insensitive to the number of iterations.

If, however, I use a small vector,

julia> x5=rand(5);

julia> @time run0(x5, V, 10);
  0.116469 seconds (68.13 k allocations: 3.728 MiB)

julia> @time run0(x5, V, 10);
  0.000157 seconds (17 allocations: 1.453 KiB)

julia> @time run0(x5, V, 20);
  0.000167 seconds (27 allocations: 2.078 KiB)

julia> @time run0(x5, V, 100);
  0.000385 seconds (107 allocations: 7.078 KiB)

julia> @time run0(x5, V, 1000);
  0.002894 seconds (1.01 k allocations: 63.328 KiB)

I see growth in memory usage with the number of iterations.

cortner · April 4, 2018, 7:11pm

bumping this - would love to know as well what is going on here.

kristoffer.carlsson · April 4, 2018, 8:04pm

Ok, so funnily enough this is exactly the problem we discussed before regarding not parameterizing on function arguments.

https://github.com/JuliaDiff/ForwardDiff.jl/pull/315 removes the allocations (and makes it a bit faster).

julia> x5=rand(5);

julia> @time run0(x5, V, 10);
  0.000094 seconds (19 allocations: 1.703 KiB)

julia> @time run0(x5, V, 10);
  0.000047 seconds (7 allocations: 848 bytes)

julia> @time run0(x5, V, 20);
  0.000039 seconds (7 allocations: 848 bytes)

julia> @time run0(x5, V, 100);
  0.000104 seconds (7 allocations: 848 bytes)

gideonsimpson · April 4, 2018, 9:41pm

So what do I need to do (I’m still a bit of a novice)? I tried Pkg.checkout("ForwardDiff") and then running with that, but I’m still having the same issues with my code.

kristoffer.carlsson · April 4, 2018, 9:43pm

You need to check out the name of that branch since it isn’t merged. Pkg.checkout("ForwardDiff", "kc/f_spec") probably.

gideonsimpson · April 4, 2018, 9:48pm

Got it, thanks.

gideonsimpson · April 5, 2018, 1:35am

I’m noticing (possibly) related issues with the following function:

function V(x)
       result = 0.0;
       for i in 1:length(x)
              result+= (x[i]^2-1)^2
       end
       return result
end

The setup is otherwise the same, though I am now using your suggested pull. But using this V(x), I get linear growth in allocations. However, if I swap result=0.0 for result=zero(eltype(x)), everything is well behaved again. Is this just an example of a need for explicit typing, or is there something else at work here?

rdeits · April 5, 2018, 2:13am

Yes, this is just a classic type-instability, because the eltype of x is a ForwardDiff.Dual, so in your code result changes from a Float64 to a ForwardDiff.Dual. It’s the same situation as https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-changing-the-type-of-a-variable-1 but with a different set of types.

gideonsimpson · April 5, 2018, 4:33pm

I am a bit surprised that 0.0 is inadequate for ensuring that it was interpreted as a floating point number. Regardless, is the use of eltype in this example the stylistically favored solution?

kristoffer.carlsson · April 5, 2018, 4:36pm

0.0 is a Float64. That is different from a Dual number. Using eltype works well or

function V(x::Vector{T}) where {T}
    result = zero(T)
    ....
end

rdeits · April 5, 2018, 4:46pm

0.0 is sufficient to ensure that result starts out as a Float64. The problem is that when you use ForwardDiff, your function is called not with a Float64 argument but with a special ForwardDiff.Dual number. Try printing eltype(x) inside your function when you compute its gradient to see that.

That’s why the recommendation is to use zero(eltype(x)) which will just do the right thing (Float64 for Float64, Dual for Dual, etc.) at no additional cost.

Topic		Replies	Views
ForwardDiff Allocations Performance memory-allocation , forwarddiff	3	123	June 19, 2025
Large amount of memory allocation when using autodiff and IPNewton Performance	1	121	January 30, 2024
Debugging type-instability allocations with ForwardDiff Performance forwarddiff	12	339	April 29, 2024
Getting ForwardDiff jacobian! to execute with zero allocations Performance performance , memory-allocation , forwarddiff	9	973	April 26, 2024
ForwardDiff & mul!() Numerics	6	1272	August 29, 2019

ForwardDiff and GradientConfig memory usage

Related topics