Why `mean` with `dims` argument is so slow?

Consider:

using BenchmarkTools, Statistics
A = randn(4,3,2);
@btime mean($A; dims=1);
#  90.184 ns (1 allocation: 128 bytes)
@btime mean($A);
#  9.891 ns (0 allocations: 0 bytes)

That’s a 10x factor difference! Is there a way to improve over this? A faster way to compute mean along a dimension?

Consider:

julia> A = randn(400,300,200);
julia> using BenchmarkTools
julia> @btime mean($A; dims=1);
  9.917 ms (2 allocations: 468.83 KiB)
julia> @btime mean($A);
  9.615 ms (0 allocations: 0 bytes)

Nanosseconds measures are bullshit.

4 Likes

Perhaps it would be a bit more accurate to say that nanosecond measurements are very delicate because you can very easily end up measuring something other than what you intended to measure. I believe this is what is happening in the benchmark reported by the OP: mean(A) returns a scalar and so it is allocation-free, while mean(A, dims=1) returns a vector and so it must perform at least one allocation, and this allocation ends up dominating the overall runtime.

6 Likes

Well, it’s an 80 ns difference. You can avoid some of it by pre-allocating the output in the shape you want:

julia> using BenchmarkTools, Statistics

julia> A = randn(4,3,2);

julia> @btime mean($A; dims=1);
  101.318 ns (1 allocation: 128 bytes)

julia> @btime mean($A);
  10.427 ns (0 allocations: 0 bytes)

julia> m = zeros(1, 3, 2);

julia> @btime mean!($m, $A);
  62.747 ns (0 allocations: 0 bytes)
5 Likes

Oh, I should have tested on larger matrices!