All methods of /
I know of (in Base
& standard libraries) are type stable. If you are aware of one that isn’t, that may be a bug.
The error you reported here is a relative error of one ulp, which is quite reasonable. Since mean
is giving a floating-point output, the normal expectation would be to have some rounding error — the exact answer will generally not be representable.
(Furthermore, Float64
addition is exact for integers up to 2^53, so you will only see rounding errors for integers so big that you will be risking Int64
overflow anyway, and 32-bit machines are even worse.)
For algorithms that compute a floating-point result, the normal expectation is to avoid spurious overflow due to intermediate calculations if it is at all practical. For example, we give the correct answer for functions like hypot
and norm
for integer inputs:
julia> abs(typemax(Int) + typemax(Int)*im)
1.3043817825332783e19
julia> using LinearAlgebra
julia> norm(fill(typemax(Int), 1000))
2.916686334356758e20
Indeed, norm
is a classic case where one accepts a performance price to avoid spurious overflow. On my machine we are paying a factor of 3 even for Float64
inputs:
julia> unsafenorm(x) = sqrt(sum(abs2, x))
julia> x = rand(10^4);
julia> @btime norm($x);
2.621 μs (0 allocations: 0 bytes)
julia> @btime unsafenorm($x);
923.531 ns (0 allocations: 0 bytes)
We should certainly try to make a common function like mean
as fast as possible, but not at the expense of correctness, given that we are already returning a floating-point result. Start with a correct implementation (e.g. this one), then make it faster it if we can.
(Honestly, I’m skeptical that many applications will even notice a constant-factor slowdown of mean
, since I doubt it is the performance-critical step in almost any nontrivial problem.)
I don’t mean to be pedantic here, but mean
is “correct” in the sense of being equal to sum(v) / length(v)
, where sum
is “correct” in the sense of being equal to reduce(+, v)
.
These of course fail for overflow because Julia chose native integer arithmetic. This was a decision with various trade-offs in mind, but it has pervasive implications for the language.
I think your suggestion is a reasonable alternative, but the current implementation is no more “incorrect” than +(::Int, ::Int)
is. It is not at all obvious to me that mathematical correctness should always be the goal at the expense of speed in Julia.
This seems like a circular argument; that’s an algorithm, and you could assert it as a definition, but I would say that it is not the conventional definition. You could just as well “define” norm(x) = sqrt(sum(abs2, x))
and say that Inf
is therefore the “correct” result for norm([1e300])
. Put another way, if we change the implementation of sum
in a way that changes the roundoff error for floating-point inputs, are we changing the “definition” of mean(::Vector{Float64})
?
For a floating-point function, it is the ordinary expectation to define “correctness” and “accuracy” with respect to the function as computed with infinite precision (i.e. over ℝ). I think that expectation applies here, since mean(::Vector{Int})
returns a Float64
.
(Matters are different for arithmetic functions whose inputs and outputs are a finite-precision integer type.)
I’m fairly confident we can avoid slowdown, with tricks like Per’s. Just as with division, mean changes to floats, and we can and should define the operation to work without overflow. I could look into the performance issue if it turns out to be real.
Off-topic:
Yes, I think so, and for DAC, while I recall on the Acorn Archimedes, 8-bit sound was logarithmic, making it sound better (and volume control simpler, just subtraction, fast on the ARM with no DIV back then). Are logarithmic DAC/ADC outdated now?
I’m sure it’s real, and doubt most of the slowdown is avoidable:
julia> function mysum1(x)
s = 0
for xᵢ ∈ x
s += xᵢ
end
s
end
mysum1 (generic function with 1 method)
julia> function mysum2(x)
s = zero(Int128)
for xᵢ ∈ x
s += xᵢ
end
s % Int64
end
mysum2 (generic function with 1 method)
julia> function mysum3(x)
s = 0.0
@fastmath for xᵢ ∈ x
s += xᵢ
end
s
end
mysum3 (generic function with 1 method)
julia> using BenchmarkTools
julia> x = rand(-100:100, 200);
julia> @btime mysum1($x)
17.941 ns (0 allocations: 0 bytes)
-115
julia> @btime mysum2($x)
131.092 ns (0 allocations: 0 bytes)
-115
julia> @btime mysum3($x)
158.761 ns (0 allocations: 0 bytes)
-115.0
julia> @btime mysum1($x)
17.935 ns (0 allocations: 0 bytes)
-115
This is much better with AVX512, where the conversion to Float64
can be vectorized: (testing on AWS):
julia> function mysum1(x)
s = 0
for xᵢ ∈ x
s += xᵢ
end
s
end
mysum1 (generic function with 1 method)
julia> function mysum2(x)
s = zero(Int128)
for xᵢ ∈ x
s += xᵢ
end
s % Int64
end
mysum2 (generic function with 1 method)
julia> function mysum3(x)
s = 0.0
@fastmath for xᵢ ∈ x
s += xᵢ
end
s
end
mysum3 (generic function with 1 method)
julia> using BenchmarkTools
julia> x = rand(-100:100, 200);
julia> @btime mysum1($x)
14.732 ns (0 allocations: 0 bytes)
1008
julia> @btime mysum2($x)
227.857 ns (0 allocations: 0 bytes)
1008
julia> @btime mysum3($x)
38.964 ns (0 allocations: 0 bytes)
1008.0
julia> @btime mysum1($x)
14.715 ns (0 allocations: 0 bytes)
1008
That’s nearly a 9x penalty with AVX2, and about a 2.5x penalty with AVX512.
Whether more users are likely to complain about overflowing integer means, or a constant-factor performance hit is a valid question.
On the AVX2 computer, we can improve performance (over the 9x penalty) via:
julia> mysum4(x) = maximum(x)*Float64(length(x)) < typemax(Int) ? Float64(mysum1(x)) : mysum3(x)
mysum4 (generic function with 1 method)
julia> @btime mysum4($x)
69.165 ns (0 allocations: 0 bytes)
489.0
More elaborate techniques following this approach will probably be able to improve things further.
mysum4
is just a smidge slower than mysum3
on the AVX512 computer, so a slight imrpovement is all it would take to be faster than mysum3
on both systems.
However, given longer vectors where the summation becomes memory bound, mysum3
may approach 2x faster than mysum4
.
EDIT:
I just realized it should be maximum(abs, x)
, which is much slower:
julia> mysum4(x) = maximum(abs,x)*Float64(length(x)) < typemax(Int) ? Float64(mysum1(x)) : mysum3(x)
mysum4 (generic function with 1 method)
julia> @btime mysum4($x)
93.891 ns (0 allocations: 0 bytes)
489.0
Converting to float first is not strictly more correct. As has been shown there are cases where integers overflow, which leads to one kind error, but there as also cases where floats give the wrong answer because they lose precision and are not associative. It would be very surprising to me if computing mean(v)
did not behave the same as sum(v)/length(v)
. That means when the values are integers, I expect overflow as a possibility and do not expect lack of associativity.
Converting to float first is not strictly more correct. As has been shown there are cases where integers overflow, which leads to one kind error, but there as also cases where floats give the wrong answer because they lose precision and are not associative.
The former kind of error is catastrophic, that is, the answer has no valid digits. The later isn’t - the loss of accuracy in a floating-point sum is not unexpected.
t would be very surprising to me if computing
mean(v)
did not behave the same assum(v)/length(v)
.
That looks like an argument in favour of making the result of mean() on integers to be an integer as well. That might be actually an improvement over the present situation.
It’s not because division of integers does not produce an integer.
The point that some of us are trying to make here is that one may not expect the sum of integers to be performed in floating point.
I fully understand why some users would prefer that mean
protects them from “surprises” like that. However, the right question in Julia is usually not what you can improve for a very specific set of argument types, but what the API can/should promise for generic types.
For example, mean
converting an AbstractVector{Int}
to Float64
elementwise before doing the calculation is pretty useless unless we guarantee that something similar also happens for all subtypes of Integer
, because unless we do this, it is just a special case that does not help generic code so it is hard to rely on.
A really nice general solution would be something like propagating @fastmath to all functions called. So the user can choose either the correct & slower computation, or almost correct & much faster.
My suggested solution does just that in a type-generic way: https://github.com/JuliaLang/Statistics.jl/issues/22#issuecomment-587119323
I don’t think the principle here is very difficult to understand: if the outputs are floating point, then you want a computation that is numerically stable and reasonably accurate in the floating-point sense, even if the inputs are integers. (Example: the norm
of an integer array.) The current mean
algorithm, which can give a large error for integer inputs regardless of the floating-point precision, is numerically unstable according to the formal definition.
The point that some of us are trying to make here is that one may not expect the sum of integers to be performed in floating point.
We are not discussing computation of the sum of integers, though.
It’s not because division of integers does not produce an integer.
Sometimes it does:
julia> typeof(div(1,2))
Int32
Sure I guess if you call a totally different function?
It’s a “division of integers” function. Another one is //
Yes, there are many forms of integer division. I don’t really get what that shows. The only division operation that generally computes the mean in this case is the one that produces a float. sum(v)/length(v)
is also the most obvious definition. Deviation from that, e.g. by converting to float first, could be opted into, but my point is that while you’re surprised that this overflows, other people (like myself) would be surprised that it suffers from the floating point issues given that it’s apparently an operation on integers. It’s not just the loss of precision, it’s the fact that for integers answers are perfectly reproducible, whereas for floats, they’re not—they depend on the architecture of the machine, the details of the summation algorithm and the number of threads.
It gives a floating-point answer, so I don’t understand why you’d be surprised that the algorithm might involve floating-point calculations, or that it would be designed to give an accurate answer in the floating-point sense (as opposed to the 2s-complement sense).
Are you unpleasantly surprised that norm(x::Vector{Int})
does not give the same answer as sqrt(sum(abs2, x))
? Would you rather have norm([10^13])
give a DomainError
(as the naive algorithm would produce)?
No, of course or would be good to give the numerically correct answer more of the time. My point is that what is a surprising result depends on your background and expectations. Doing integer summation followed by float division is obvious and fast in this case. If we can figure out a way to avoid overflow while preserving performance and accuracy, that would be ideal.
The naive algorithm for norm
is almost 20× faster than the algorithm we are currently using:
julia> x = rand(1:100, 1000);
julia> naivenorm(x) = sqrt(sum(abs2, x))
naivenorm (generic function with 1 method)
julia> @btime norm($x); @btime naivenorm($x);
3.397 μs (0 allocations: 0 bytes)
189.975 ns (0 allocations: 0 bytes)
We made the tradeoff here because it was more important to give an accurate answer that avoids spurious overflow catastrophies, and I think the same consideration applies to mean
.
(Neither norm
nor mean
seems likely to be the performance-critical step in real applications, which is an additional reason to prefer accuracy and robustness to performance.)