Hello,
I have two codes that are doing the exact same thing. I put them below. In the latter one, we use a multi-dimensional array instead of x. The former one is substantially faster than the other, and uses smaller number of allocations. What is the reason of it?
Thanks in advance.
—
x = [0.76,-1.13,1.33];
@time for i=1:450000
dot(x,[1.,2.,3.])
end
0.034094 seconds (900.00 k allocations: 96.130 MB, 11.11% gc time)
—
X = zeros(3,3,3,100);
X[:,1,1,1] = x;
@time for i=1:450000
dot(X[:,1,1,1],[1.,2.,3.])
end
0.226584 seconds (2.70 M allocations: 130.463 MB, 3.27% gc time)
This might be because indexing allocates memory, but it’s hard to tell for sure since you seem to be benchmarking in the global scope which causes problems.
To add to what John said, I recommend checking out the BenchmarkTools.jl package for a more comprehensive (and easy to use!) approach to benchmarking.
using BenchmarkTools
foo(x,y) = dot(x[:,1,1,1], y)
bar(x,y) = @inbounds dot(x[:,1,1,1], y)
baz(x,y) = dot(x, y)
x = [0.76,-1.13,1.33];
X = zeros(3,3,3,100);
X[:,1,1,1] = x;
y = [1.,2,3]
@benchmark foo($X,$y)
@benchmark bar($X,$y)
@benchmark baz($x,$y)
julia> @benchmark foo($X,$y)
BenchmarkTools.Trial:
memory estimate: 128.00 bytes
allocs estimate: 2
--------------
minimum time: 52.000 ns (0.00% GC)
median time: 58.526 ns (0.00% GC)
mean time: 64.345 ns (8.17% GC)
maximum time: 1.080 μs (93.56% GC)
--------------
samples: 10000
evals/sample: 984
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark bar($X,$y)
BenchmarkTools.Trial:
memory estimate: 128.00 bytes
allocs estimate: 2
--------------
minimum time: 51.291 ns (0.00% GC)
median time: 55.050 ns (0.00% GC)
mean time: 61.033 ns (9.23% GC)
maximum time: 1.188 μs (92.17% GC)
--------------
samples: 10000
evals/sample: 984
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark baz($x,$y)
BenchmarkTools.Trial:
memory estimate: 0.00 bytes
allocs estimate: 0
--------------
minimum time: 14.059 ns (0.00% GC)
median time: 15.109 ns (0.00% GC)
mean time: 15.060 ns (0.00% GC)
maximum time: 29.657 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
time tolerance: 5.00%
memory tolerance: 1.00%
2 Likes