# Why `mean` with `dims` argument is so slow?

Consider:

``````using BenchmarkTools, Statistics
A = randn(4,3,2);
@btime mean(\$A; dims=1);
#  90.184 ns (1 allocation: 128 bytes)
@btime mean(\$A);
#  9.891 ns (0 allocations: 0 bytes)
``````

That’s a 10x factor difference! Is there a way to improve over this? A faster way to compute `mean` along a dimension?

Consider:

``````julia> A = randn(400,300,200);
julia> using BenchmarkTools
julia> @btime mean(\$A; dims=1);
9.917 ms (2 allocations: 468.83 KiB)
julia> @btime mean(\$A);
9.615 ms (0 allocations: 0 bytes)
``````

Nanosseconds measures are bullshit.

4 Likes

Perhaps it would be a bit more accurate to say that nanosecond measurements are very delicate because you can very easily end up measuring something other than what you intended to measure. I believe this is what is happening in the benchmark reported by the OP: `mean(A)` returns a scalar and so it is allocation-free, while `mean(A, dims=1)` returns a vector and so it must perform at least one allocation, and this allocation ends up dominating the overall runtime.

6 Likes

Well, it’s an 80 ns difference. You can avoid some of it by pre-allocating the output in the shape you want:

``````julia> using BenchmarkTools, Statistics

julia> A = randn(4,3,2);

julia> @btime mean(\$A; dims=1);
101.318 ns (1 allocation: 128 bytes)

julia> @btime mean(\$A);
10.427 ns (0 allocations: 0 bytes)

julia> m = zeros(1, 3, 2);

julia> @btime mean!(\$m, \$A);
62.747 ns (0 allocations: 0 bytes)
``````
5 Likes

Oh, I should have tested on larger matrices!