I’m doing micro-optimization of my code, and current bottleneck is a place, where I call mean
millions of times on small 2d arrays. And it leads to huge number of allocations. Simply re-implementing mean
removes them all. Minimal example is the following:
using Statistics
function base_mean(xs::Array{Matrix{Float64}, 1})
tm = [0.0, 0.0]
for x in xs
tm .= vec(mean(x, dims=1))
end
return tm
end
function custom_mean(xs::Array{Matrix{Float64}, 1})
tm = [0.0, 0.0]
for x in xs
tm .= 0.0
for j in 1:size(x, 2)
for i in 1:size(x, 1)
tm[j] += x[i,j]
end
tm[j] /= size(x, 1)
end
end
return tm
end
t_xs = [rand(20, 2) for i in 1:10000];
println(@benchmark base_mean(t_xs))
println(@benchmark custom_mean(t_xs))
Results for base_mean
:
BenchmarkTools.Trial:
memory estimate: 1.68 MiB
allocs estimate: 30001
--------------
minimum time: 1.491 ms (0.00% GC)
median time: 1.529 ms (0.00% GC)
mean time: 2.193 ms (29.34% GC)
maximum time: 18.634 ms (91.35% GC)
--------------
samples: 2273
evals/sample: 1
For custom_mean
:
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 347.485 μs (0.00% GC)
median time: 352.084 μs (0.00% GC)
mean time: 356.737 μs (0.00% GC)
maximum time: 20.283 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
So, it’s 5 time faster and use no memory. According to the manual, .=
should be optimized for allocations. I had a hypothesis that the problem is that mean
returns 2d array, and I convert it into 1d with vec
, but first, having 2d mean doesn’t solve the problem (benchmark below), and second, I need 1d…
function base_mean_2d(xs::Array{Matrix{Float64}, 1})
tm = [0.0 0.0]
for x in xs
tm .= mean(x, dims=1)
end
return tm
end
@benchmark base_mean_2d(t_xs)
BenchmarkTools.Trial:
memory estimate: 937.59 KiB
allocs estimate: 10001
--------------
minimum time: 1.046 ms (0.00% GC)
median time: 1.058 ms (0.00% GC)
mean time: 1.393 ms (22.87% GC)
maximum time: 22.613 ms (94.87% GC)
--------------
samples: 3583
evals/sample: 1