Inconsistent behavior of `sum`,`mean` (and probably others) on different collection types

I’ve discovered a lot of inconsistent behavior today with the sum and mean functions (while investigating a question asked here on Discourse), some of which are regressions compared to v0.4.x.
For example, on v0.4.x:

julia> sum([0xff,0xff])
0x00000000000001fe
julia> sum((0xff,0xff))
0xfe
julia> sum([0xff for i = 1:65536*256*8])
0x00000007f8000000

julia> mean([0xff,0xff])
255.0
julia> mean((0xff,0xff))
127.0
julia> mean([0xff for i = 1:65536*256*8])
255.0

On v0.5.0:

julia> sum([0xff,0xff])
0x000001fe
julia> sum((0xff,0xff))
0xfe
julia> sum([0xff for i = 1:65536*256*8])
0xf8000000

julia> mean([0xff,0xff])
255.0
julia> mean((0xff,0xff))
127.0
julia> mean([0xff for i = 1:65536*256*8])
31.0

So, collections that aren’t AbstractArray get incorrect results in both versions, but v0.5.0 gets incorrect results when summing or calculating the mean of larger collections, because it promotes only to a UInt32, instead of a UInt64.
This makes me worry that even more functions are broken in v0.5 and later, due to depending the previous promotion rules (and even then, some methods on abstract collections have always been broken).

The possibility of overflow in Julia’s arithmetic is not a bug, it was an intentional design decision that has been extensively discussed elsewhere. The question is, by how much should the type be widened for a reduction?

This was changed to widen to 32 bits (instead of 64 bits on 64-bit machines) in https://github.com/JuliaLang/julia/pull/14414 for the reasons described in that PR. Again, a design decision, not a bug. (In fact, you responded “LGTM” to that PR. :stuck_out_tongue_winking_eye:)

If you want to use Int, you can instead do reduce(+, 0, array).

4 Likes

Though sum((0xff,0xff)) is clearly inconsistent with the equivalent method for arrays, and should probably also promote to UInt32. (I would also find it more logical for all cases to promote to UInt16 even if it might be slower, as was discussed on the PR.)

I think you missed what I feel is the most important part of what I saw, that an operation on a set of values should not give different results based on what type of collection holds the values (as long as iterating over the values produces them in the same order, as there are potential issues since the operations may not be associative, such as addition on fixed precision floating point).
Yes, I believe that widen is better promoting to Int32 and UInt32 rather than Int and UInt for Int16/Int8, but I think that is a separate issue. I noticed the change in sum because I was looking at the problems with inconsistent results from mean.
For certain types, mean is implemented with sum, but for others, it does the addition itself, using the promotion of val + zero(typeof(val)).
Is there anything that requires sum to use a particular promotion (and to not check for overflow)?
There is a special case for sum of Bool, for example, so why not special promotion rules for other types?
That is especially true for mean, where the result is not even of the same type (or widened type).
Also, I think there are other ways of implementing mean that can avoid getting totally incorrect results, while still being performant.
I’d thought I’d read in one of the long discussions that overflow was not checked for simple arithmetic operations, for performance reasons, but for more expensive operations, Julia tended towards trying to return the correct results or throw an exception. In those cases, silently getting the wrong answer fast, doesn’t seem all that useful to me.

1 Like

sum could certainly be implemented to use Int64 promotion for smaller integer types, but then it would be slow on 32-bit machines (and be a bit slower on 64-bit machines). And checking for overflow would be (a) much slower and (b) type-unstable if you widen rather than throwing an exception.

(One possibility would be to use Float64 accumulation, which is exact up to 53-bit integers and is fast on all architectures, but then of course there is the question of what integer type to convert the final sum result to and how to handle cases that overflow 53 bits, incurring roundoff errors. Float64 accumulation makes even more sense for mean, since the result is returned in floating-point form anyway.)

Still, the current behavior is not a crazy tradeoff: if you want the space efficiency of working with e.g. 8-bit or 16-bit integers, the price is that you have to think a little bit more carefully about overflow (e.g. by calling reduce if you want to specify Int64 or Float64 accumulation).

Prior to the PR I mentioned, the behavior was to promote to Int, which is fast on both 32-bit and 64-bit machines, but then you get different results when you run on the different architectures. The behavior of widen is not an entirely “separate issue” from the behavior of sum — the considerations here are very similar to those in that PR.

Hmm, I just noticed that sum widens Int32 to Int, which means that the results are still architecture-dependent anyway. Given that, it is not crazy to widen Int16 and Int8 to Int too.

(More simply, just sum(Int, array).)

You are still focusing on the issues of promotion. What about the issue of inconsistent results from different container types (even when the iterators produce the exact same values in the same order)?

The question of Tuple handling is separate; I tend to agree with @nalimilan that sum should probably treat arrays and tuples similarly with respect to promotion.

A tip that you’ve heard before: raising multiple independent issues in a single thread confuses the discussion. Nor does calling design decisions you disagree with (or the absence of features you would like) “bugs” or “broken” facilitate productive discourse.

1 Like

I am simply trying to raise an issue about some related cases that I found where some Julia functions give unexpected, inconsistent results. I have not said anything at all here about disagreeing with any design issues or features that are lacking from Julia.

Do you agree with the principle that if you have a function that operates on a set of values in a collection, that function should return the same result (if the values are iterated over in the same order)?
In this particular case, it is strictly an implementation detail that mean (in only some cases!) uses sum to calculate its result, which ends up giving inconsistent, incorrect results, depending on the type of collection the values are in.
I can already think of a number of different ways that mean can be implemented that would avoid that happening (and without checking for overflow for every addition either).

1 Like

You’ve raised at least four separate issues: whether sum should promote to a wider type than Int32 for small integer types (or somehow check for overflow), whether mean should use a different algorithm (not sum/length) to make it less susceptible to overflow for integer collections, whether the algorithms of all functions should be independent of container type, and behavior changes from 0.4 to 0.5.

I agree that the algorithm of mean could be made less susceptible to overflow for integer arrays, most simply by using floating-point summation (although there are other possibilities).

I don’t necessarily agree that f(collection::T) should necessarily be independent of T for all f and T. For one thing, this precludes many improved algorithms that are applicable only to particular container types. (e.g. summation of floating-point values can be made vastly more accurate for arrays allowing random access than for iterators that only expose sequential access, unless one is willing to slow down the iterator case by a lot.) For another thing, this constraint is impossible to enforce in a multiple-dispatch language. That being said, in the specific case of sum, I agree that it would be reasonable to change sum(::Tuple) to use the same algorithm as for Array.

2 Likes

it would not only be notably slower, but also violate the duck typing the method now has. for example now you can use complex valued functions, and have a complex result. you can of course solve this particular situation by using float(), but what about a type that does not support float(), but does support +() and zero(), and the result of +() does support /(::T, ::Integer) ?

edit: i kinda lost track which topic i’m in. this particular method of mean came up in another topic. but the principle is the same: if you force an intermediate float, you lose generality.

Throwing a few ideas out in the form of rhetorical-ish questions:

Is wraparound ever desirable in practical situations? Wouldn’t you always want to throw an overflow error or return NaN? (Is an overflow even a NaN?)

Shouldn’t it be the user’s responsibility to cater for potential overflow errors, rather than the developer having to contort Julia’s internals? It’s reasonable to expect some level of engineering skill…

As noted, the user can specify sum( T, [foo, bar] ) to manually force promotion to T. Isn’t this one of the Zen rules of Python: “Explicit trumps implicit”?

It must be difficult to implement a consistent automatic selective overflow checking internally: if checking is not present for +, how can it be present for sum, which is just a bunch of + operations in a loop?

Overflow checking is clearly at odds with performance. I wonder if it might make sense, rather than for Julia to make difficult trade-off decisions, to have some global overflow_check flag that would be on by default but can be turned off. By offing the flag the user is signing on the dotted line that they know what they’re doing. And maybe get the @fastmath macro to disable overflow checking with tmp = overflow_check; overflow_check = false; ...; overflow_check = tmp.

Speed is valuable, but a clean consistent simple interface is priceless. Machines get faster each year and 32 bit systems are arguably on the way out. So I’m not sold on the arguments for promotion to Int32 rather than Int64. If the user wishes to take advantage of specific architectures, let them manually supply the promotions. Let them optimise the code. But keep the internals of Julia clean as a whistle!

i.e. They can specify (a+b)::t – is that the same as +(T, a, b) or +( promote(T, a, b) ... )? Or does Julia first do a+b and subsequently convert to T?

I’ve just bumped into a similar promotion issue elsewhere (https://github.com/JuliaLang/julia/issues/19714)

I wonder if rather than just a check_overflow flag, maybe Julia could also have an overflow policy – on overflow:

  • NaN
  • warn
  • error

… and a promotion policy:

  • use biggest type (so 8 8 → 8, 8 16 → 16 etc.)
  • next power of two (so 8 8 → 16, 8 16 → 32 etc.)
  • 32-bit optimised: promote <32 to 32, <64 to 64 (so 8 8 → 32, 8 16 → 32, 16 32 → 64 etc.)
  • 64-bit: promote everything to 64, so (8 8 → 64 etc.)

@pitao Please don’t make this issue broader than it is (and apologizing about the signal-to-noise ratio as a preamble doesn’t really help). Deciding on the best behavior for sum and mean is already enough for on thread. Checking for overflow everywhere is a hard question, see for example this issue.

@pint float works for Complex arguments, it returns a Complex{<:AbstractFloat} value. Wouldn’t that be generic enough for a mean implementation?

@ScottPJones Do you think you could prepare a branch to fix the inconsistent behavior for tuples? Then I can open a PR for you.

you mean the float() function. yes, i was acknowledging that, but it is already different than the proposal of using Float as accumulator. however, as i said, what about types that does not support float()? and don’t ask me to give an example. being general is a good thing, well, in general :slight_smile:.

but you know what? here is an example: now i can compute the mean of rationals. if you float-ize the accumulator, mean of rationals will be float.

Right. I guess the most general and most correct approach would be to accumulate using the type of (one(T)+one(T))/one(T), i.e. the return type of mean. The only drawback is that it could be slower than preserving the input type (notably for small integers), but it’s also the best way of preventing silent overflow in most cases.

I would only use this method for a set of types known to work: T<:smallinteger and Complex{T}. There’s no way to make mean immune to overflow for arbitrary user-defined types. I agree that there is a significant performance tradeoff here, and it is debatable whether it is worth it. (For integers < 32 bits, another option would be to do the sums in sufficiently small blocks.)

(There are lots of cases in Base where we have multiple methods for the “same” function, to take advantage of properties of individual types. e.g. mean(a) has 3 methods already.)

The basic decision in Julia was to make arithmetic on fixed-width integer types fast, at the price that the programmer has the responsibility to worry about overflow. We can and should mitigate this where it is practical to do so, and Julia will no doubt improve incrementally in this regard, but the basic tradeoff isn’t going to disappear.

@nalimilan Yes (although, it’s not just tuples, it seems to be anything that is not an AbstractArray). Thanks, that would be great. I’m only trying to improve the overall quality of Julia here (and I’m not saying that the overall quality is bad at all (I wouldn’t be using it for production code ever if I felt that way!) it’s just that nothing is perfect (even if we all feel that Julia is closer to perfection than any other language we know!)
@stevengj I’d said that the results were inconsistent between v0.4.x and later versions for sum, not that that necessarily was a bug, however, where is it written that sum must use the widen promotion? (There is none, for Bool, and it promotes to Int). That seems to be again an undocumented implementation detail. I think an actionable item in this case would be to at least add to the documentation for sum, explaining what promotion is used, and how to work around the issue by using sum(Int,...) or sum(UInt,...) (I can make a branch for that as well, if some nice person can then make a PR out of it).

Remember that the problem with mean is the inconsistency based on collection type.
The result type for mean on an AbstractArray of Integer types is typeof(one(widen(T))/1), (not one(T)),
for Bool types it is typeof(1/1), and for other iterables, typeof((zero(T)+zero(T))/1).
I’d be rather concerned also about the performance consequences also of using a Float type here do to the summation.

2 posts were split to a new topic: Was: Inconsistent behavior of sum and mean