Mean of integers overflows - bug or expected behaviour?

The error you reported here is a relative error of one ulp, which is quite reasonable. Since mean is giving a floating-point output, the normal expectation would be to have some rounding error — the exact answer will generally not be representable.

(Furthermore, Float64 addition is exact for integers up to 2^53, so you will only see rounding errors for integers so big that you will be risking Int64 overflow anyway, and 32-bit machines are even worse.)

For algorithms that compute a floating-point result, the normal expectation is to avoid spurious overflow due to intermediate calculations if it is at all practical. For example, we give the correct answer for functions like hypot and norm for integer inputs:

julia> abs(typemax(Int) + typemax(Int)*im)
1.3043817825332783e19

julia> using LinearAlgebra

julia> norm(fill(typemax(Int), 1000))
2.916686334356758e20

Indeed, norm is a classic case where one accepts a performance price to avoid spurious overflow. On my machine we are paying a factor of 3 even for Float64 inputs:

julia> unsafenorm(x) = sqrt(sum(abs2, x))

julia> x = rand(10^4);

julia> @btime norm($x);
  2.621 μs (0 allocations: 0 bytes)

julia> @btime unsafenorm($x);
  923.531 ns (0 allocations: 0 bytes)

We should certainly try to make a common function like mean as fast as possible, but not at the expense of correctness, given that we are already returning a floating-point result. Start with a correct implementation (e.g. this one), then make it faster it if we can.

(Honestly, I’m skeptical that many applications will even notice a constant-factor slowdown of mean, since I doubt it is the performance-critical step in almost any nontrivial problem.)

9 Likes