… and types.
Why - I thought that is not normally needed? My understanding here was that functions are a bit special because of this compiler optimisation? What am I missing?
I see - thank you!
I agree that it is a confusing right now. We typically say that function argument types is only used for dispatch and has no effect on performance. This is an exception to that.
It would be good to have clear documentation about this. I guess he issue linked above is about that?
I wanted to revisit this topic with a similar piece of code. The following is a minimal example associated with a stochastic sampling method that uses the gradient of the potential, V(x), in its computation. I tried using the where
command, and obtained rather disappointing results:
function run0(x0, gradV::gradVf, niterations) where {gradVf}
X0 = similar(x0);
gradV0 = similar(x0);
@. X0 = x0; # copy over initial condition
for j = 1:niterations
gradV0 .= gradV(X0);
end
end
function gradV(x)
gradV = 4 * (dot(x,x) -1 ) * x;
return gradV
end
niterations = 10^6;
X0 = .5 * ones(2);
@time run0(X0, gradV, niterations);
After the second run, of this, I get:
julia> @time run0(X0, gradV, niterations);
0.070732 seconds (1.00 M allocations: 91.553 MiB, 9.93% gc time)
And the memory usage linearly scales with the number of iterations. The only difference between this and the other example is that gradV
is a \mathbb{R}^d\to \mathbb{R}^d function while V
was a \mathbb{R}^d\to \mathbb{R} function.
But this is just because you are allocating a new vector inside the function in every iteration?
Shouldn’t preallocating the gradV0
vector (gradV0 = similar(x0);
) and then using the .=
assignment operation avoid creating a new vector each time? If not, how would I code this to avoid the creation of a new vector at each iteration?
gradV(X0)
allocates a vector which is copied to gradV0
. You can send in gradV0
to the function and assign to it there.
So the C-style solution of passing a pointer to the array that is being populated with new values is the recommended approach for this scenario?
Well, if you want to assign in place to a vector then you need to have a handle to that vector.
Consider this version of the problem:
using ForwardDiff
function run1(x0, V::Vf, niterations) where {Vf}
X0 = similar(x0);
gradV0 = similar(x0);
@. X0 = x0;
for j = 1:niterations
ForwardDiff.gradient!(gradV0, V, X0);
end
end
function V(x)
@fastmath V = (dot(x,x)-1)^2;
return V
end
niterations = 10^4;
X0 = .5 * ones(2);
with
julia> @time run1(X0, V, niterations);
0.059611 seconds (60.01 k allocations: 2.900 MiB, 8.54% gc time)
and when I increase the number of iterations:
julia> niterations = 10^6;
julia> @time run1(X0, V, niterations);
5.537944 seconds (6.00 M allocations: 289.917 MiB, 0.43% gc time)
Though perhaps this is a problem with ForwardDiff
.
Maybe try to write an in-place gradient gradV!
that doesn’t allocate?
Or use StaticArrays
was about to send the same link
That doesn’t really seem to improve things. With
function run1(x0, V::Vf, niterations) where {Vf}
X0 = similar(x0);
gradV0 = similar(x0);
@. X0 = x0;
gradcfg = ForwardDiff.GradientConfig(V, X0);
for j = 1:niterations
ForwardDiff.gradient!(gradV0, V, X0,gradcfg);
end
end
function V(x)
@fastmath V = (dot(x,x)-1)^2;
return V
end
niterations = 10^6;
X0 = .5 * ones(2);
I get:
julia> @time run1(X0, V, niterations);
0.477692 seconds (4.00 M allocations: 122.071 MiB, 1.46% gc time)
Removing the @slowmath
macro makes the allocations go down to 1 per iteration.
To improve the run time slightly we can either create the GradientConfig
with an explicit chunk size like:
gradcfg = ForwardDiff.GradientConfig(V, X0, ForwardDiff.Chunk{2}())
or put the creation of the config behind a function-barrier.
julia> @time run1(X0, V, niterations);
0.300106 seconds (1.00 M allocations: 30.518 MiB, 0.93% gc time)
I’m not sure where the last allocation in each iteration comes from.
Using a StaticArray which doesn’t need any preallocation or configs:
julia> @time run1(rand(SVector{2}), V, niterations);
0.008152 seconds (17 allocations: 928 bytes)