Perhaps it would be a bit more accurate to say that nanosecond measurements are very delicate because you can very easily end up measuring something other than what you intended to measure. I believe this is what is happening in the benchmark reported by the OP: mean(A) returns a scalar and so it is allocation-free, while mean(A, dims=1) returns a vector and so it must perform at least one allocation, and this allocation ends up dominating the overall runtime.