Discussion about integer overflow

I really have to push back on this. There are so many ways that R makes it easy to inadvertently write incorrect code and not get any indication that anything is wrong. The designers have chosen over and over again to give anything that’s syntactically valid some meaning for the sake of occasional convenience even if that meaning is quite surprising most of the time.

For example, what happens if you accidentally do vectorized operations on two vectors that don’t have the same length? Think you get an error? You do not. R “recycles” the shorter one for you, hiding your programming error. This was such a footgun that they finally added a warning if the length of the longer one isn’t a multiple of the length of the shorter one, but this still throws no error:

> c(1, 2, 3, 4) + c(5, 6)
[1] 6  8  8 10

This is even worse if you do logical indexing because then you don’t get a warning even if the length of the index vector doesn’t divide the length of the vector being indexed:

> v = c(1:5)
> x = c(T, F, T)
> v[x]
[1] 1 3 4

Look at this example very carefully to understand why 4 is in the result vector. What if you’re computing an index and you accidentally get the sign of the index wrong? Let’s say you write i <- a-b instead of i <- b-a somewhere. Suppose i was supposed to be 2 but ends up being -2 instead. Then you use that index to index into a vector. Do you think you get an error? Nope. You get all the elements of the vector except the one at index 2! Fun. Of course, there’s some code where that will throw an error, but far from all code:

> v = c(5, 4)
> i = -2 # should have been 2
> v / v[i]
[1] 1.0 0.8

Oops, the correct answer was c(1.25, 1.00). Same shape, wrong values, so you’ll be none the wiser that you made a mistake. This example combines the last two “let’s not bother the user, let’s just guess what they meant” gotchas into one silent failure. Here’s another fun one:

> x <- 10.2 * 100
> x
[1] 1020
> as.integer(x)
[1] 1019

Famously, when doing a join on factors in data frames, the join is done on the internal integer representation of the factors in each of the data frames. Why is that bad? Because there’s no reason those numbers would have anything to do with each other: the same string in each data frame will typically have a totally different factor index. That’s right: the join isn’t done on the string values of the factors but the indices that happen to be used to represent them internally. So the result will look reasonable but be complete and utter nonsense. Why is this famous? Because this feature pissed Wes McKinney off so much that he ditched R entirely and created Pandas.

All of these gotcha examples don’t even get into R’s unusual and surprising scoping and evaluation semantics. If you want a really great rundown of those, I cannot recommend Jan Vitek’s talk on trying to make R run fast highly enough. (A talk about strange programming language features and optimizing compilers? It doesn’t get any better than that for me.) The money quote from this talk is

I haven’t seen a language that is as hard to optimize as R. Somebody should give the R designer that particular prize as probably the hardest language to optimize, at least of the ones I’ve encountered and played with. Compared to R, JavaScript is a beauty of elegance and good design.

But the real question is why is R so hard to optimize? The answer is that it has features that make it basically impossible for a compiler to analyze code and understand what it does. And if a compiler can’t understand what R code does, then what chance do humans have?

Now I’m sure some people will read this and accuse me of bashing R, and to some extent I am—but only because of this repeated insistence that R is clearly much more reliable than Julia. But like Jan right after that quote, I want to acknowledge that R is an incredibly useful and successful language and one that I’ve used myself to great effect. It does, however, have a deeply YOLO design philosophy: when presented with situations where the user might have meant different things, rather than raise an error and ask them be more explicit about what they meant, R will invariably pick some behavior because it might be convenient some of the time. If you keep doing that when designing a language, you are not going to end up with a language that can be used to write reliable programs.

In contrast, Julia takes a very different stance in such situations: if you don’t know what the user intended, then don’t guess—raise an error and make them choose. That is a fundamental philosophical stance of the language. Libraries may do their own thing, of course, but I think we’ve managed to impart this philosophy to the ecosystem pretty successfully. Julia may not throw errors at compile time, but we don’t play around with guessing what someone meant when it’s ambiguous.

To bring this back to the subject at hand: yes, Julia has a bit of an ambiguity in that we haven’t forced people to be explicit about whether they intend for integer operations to wrap or not. But it’s also really easy to avoid integer overflow on 64-bit systems—basically every example where it happens involves ^. It’s also the case that since native integers do modular arithmetic, there are strong guarantees on the correctness of integer results: if the correct result is in the range of values that Int can represent, then the answer you get will be exactly correct, even if there is intervening overflow or underflow (which is pretty cool!). That means that unless you are computing astronomically large values, you’ll be fine. And if you’re doing that, use floats. So yes, this is a bit of an R-like YOLO choice, but it’s a very small, contained one. Especially when compared with how hard it is to be sure that numerical computations using floating-point are correct.

44 Likes

That would be breaking though.

2 Likes

Yes. It’s a compiler pass that takes advantage of inlining to allow it to edit code you didn’t write. More details in this SO answer: bounds checker - @inbounds propagation rules in Julia - Stack Overflow. I don’t think the same sort of implementation would work for this application. And if it didn’t depend on inlining, then the compiler would need to keep track of two separate compiled bodies for all specializations of all methods of all functions.

3 Likes

And Hadley Wickham doubled down on this. I’m here in the Juliaverse specifically because you guys are making the clearly correct decisions to make your language have clear semantics (and the result is a hugely interoperable ecosystem).

Everything that is good about Julia comes from the fact that both the compiler and the user can figure out what the heck code means. When the compiler can figure it out, it therefore makes stuff fast, and when the user can figure it out, it makes stuff more easy to be correct.

IMHO the whole thread comes off like “there is only one true behavior for numbers and that’s IEEE floating point” and that’s just wrong at every level. Integer arithmetic mod 2^n is a really important type of arithmetic and people using Julia should just get used to the idea that 2 doesn’t equal 2.0 and you have to tell Julia which you meant.

23 Likes

It would be useful if we had a rewrite script that users would be expected to run when upgrading, like abseil C++ does.

Ok, but that’s not how Julia upgrades work. We don’t break publicly documented APIs and + is definitely that.

7 Likes

So, I was curious what was possible. :slight_smile:

https://github.com/BioTurboNick/OverflowContexts.jl

using OverflowContexts
x = 2^63 - 1 # 9223372036854775807
x  + 1 # -9223372036854775808

@checked
x + 1 # ERROR: OverflowError: 9223372036854775807 + 1 overflowed for type Int64
@unchecked
x + 1 # -9223372036854775808

d() = x + 1; c() = d(); b() = c(); a() = b();

a() #-9223372036854775808

@checked
a() # ERROR: OverflowError: 9223372036854775807 + 1 overflowed for type Int64
@unchecked

a()  # -9223372036854775808

It only works at top-level though. And I’m sure there are plenty of landmines. But for some use cases, maybe a global switch, in a script or on the REPL, is all that would be needed?

Then you’d just need a per-expression variant of @unchecked to rewrite the expression to use the overflow-allowed versions in those exceptional cases that depend on overflow. Perhaps https://github.com/JuliaMath/CheckedArithmetic.jl could serve as a template

I probably won’t be active on it, but if those who expressed interest in the idea here want to run with it, go ahead!

6 Likes

That would be sufficient in most cases? I like the global option for all integers, but it’s been shown to break Julia. (Until then) this other option only for literals->SafeInteger, or at least as a first step, should be enough.

It would e.g. fix the power ^ operator automatically (while it would be slower, along with everything else, until optimized to opt out of checking). The “hope” is, and I think it would work for all functions and operator in the standard library I can think of (also for same reason for all packages?):

The parsing with its negligible run-time impact isn’t the problem (this seems like an ugly hack in the parser). This would only work for x^p where both x and p are literals without slowdown, rare in practice(?). I suppose it could be done, but we want to solve the more common and general case were neither needs to be a literal.

When neither are literals, then the “literal option” would rely on x or p being computed somewhere before from an expression with at least one integer literal, propagating the checked_int type downstream. That seems likely.

Indexing into arrays, e.g. x = A[1] wouldn’t be too slow, since as only operation on one element, but what we want fast is A[i:j], and i:j gives a range, and it seems to be it could be special-cased to change to unchecked machine integers when a range is used for array indexing. Possibly the common begin + i and end - j could also be forced to unchecked machine integers. I can’t do typeof(begin) but in context the begin/end or actually the plus/minus operator there in context could use the type system for this.

If we only look at power then R seems to have made a valid choice in that case:

Defaulting to floats for power doesn’t seem to imply type instability any more than for division (while floats logical for the latter, not obviously wrong for the former), and seems like a valid choice even when both x and p are machine integers. @Oscar_Smith is working on overflow check for power (and I know he implemented float^float with integers for speed. It still seems like a valid option to calculate int^int as float^int. It seems the ship has sailed on that, until Julia 2.0. It would be a ugly to to have bigint^int return float or special cased to return bigint or BigFloat. Still it would be type stable, different types could have different rules. I’m not sure what that means for autodifferention, would it then be ok?

Is power used ever in cryptography?

Integer powers are the base of RSA. Making Int^Int return a Float64 honestly wouldn’t be crazy… It would remove the need for the literal_pow hack that we use to make literals like 5^-10 work, and would solve the overflow issues. It seems kind of intelligent though (and can’t be reconsidered until 2.0).

3 Likes

exactly, and are used in all kinds of other places. integer to a positive power shouldn’t return a float. I mean, for example in R how do you get the integer 2^17 ? you don’t, can’t be done (at least not very discoverable … you can’t find it in the manuals)… To get that behavior is trivial with a macro though. Just convert a^b into convert(Float64,a)^b. Just like the @. macro converts everything to broadcast, this is a perfectly reasonable macro for people who want this behavior, and then it becomes explicitly called out by the usage of the macro.

@floatpows begin
   x = (3^60 - 3^59)
   y = x^2
   y/2
end

I can’t imagine that macro requires more than 10-15 lines of code esp with MacroTools.jl (though I haven’t written it)

You’re right, ^ is already type-unstable:

julia> typeof(5^-10)
Float64

julia> typeof(5^10)
Int64

julia> typeof(big(5)^10)
BigInt

julia> typeof(big(5)^-10)
BigFloat

Could it be considered a bug and changed to return Float64 like division does in all cases (and possibly BigFloats always for BigInt? For those very few who need Int or BigInt result we could have a function for that (the old one that currently implements the operator in Base). I’m not sure how much this would affect software, should run PkgEval on it. Crypto is a read herring, and I’m not sure it applies even for RSA, which is already implemented in C, wouldn’t break.

It’s not type unstable. Julia does special hacks for literal negative powers.

2 Likes

Something like this:

using MacroTools
julia> macro floatpows(exp)
           MacroTools.postwalk(exp) do (x)
               MacroTools.@capture(x,a_^b_) || return(x)
               :(convert(Float64,$a)^$b)
           end
       end
@floatpows (macro with 1 method)

julia> @macroexpand @floatpows begin
           a = (3^60 - 3^59)
           b = a^2
           a/2
       end
quote
    #= REPL[38]:2 =#
    var"#46#a" = Main.convert(Main.Float64, 3) ^ 60 - Main.convert(Main.Float64, 3) ^ 59
    #= REPL[38]:3 =#
    var"#47#b" = Main.convert(Main.Float64, var"#46#a") ^ 2
    #= REPL[38]:4 =#
    var"#46#a" / 2
end

julia> @floatpows begin
           a = (3^60 - 3^59)
           b = a^2
           a/2
       end
1.4130386091738735e28

Not that rare. It seems quite common to write constants as 6.022 * 10^23 in some programming languages where it works. (This evaluates to approximately 1.207e18 in Julia.)

3 Likes

Since you’ve said this a couple of times now, it’s worth pointing out that it’s only true for certain operations, namely anything writable in terms of ring operations +, -, and *. This includes ^.

There are other Int to Int operations this isn’t true for, such as fld, e.g. fld(2^64, 2).

Good point (it “works” in Julia, just differently), but this would work correctly with the proposed change to using floats, and so far one of the best argument for the change.

It’s not a small difference, and smaller differences (mixing up meters and feet) have doomed space missions (so far I was sort of on Karpinski’s side, that 0 returned not too bad as obviously an error):

julia> (6.022 * big(10)^23) - 6.022 * 10^23
6.0219879333319564856523701584340631961822509765625e+23

RSA is already not using Int, i.e. Int64, not worried about any crypto at all.

I had forgotten and was informed Julia does a hack for negative powers at parse time. It seems like a massive hack, but could alternatively been done for positive powers.

With floats there would be no need to special case at runtime and have the possibility of errors throwing (and misleading if you learned from an example x^p where x and p literals):

julia> 4^x  # we would still need to think of the rational case as shown in the error message "(x//1)^-4" to still return rationals...
ERROR: DomainError with -4:
Cannot raise an integer x to a negative power -4.
Make x or -4 a float by adding a zero decimal (e.g., 2.0^-4 or 2^-4.0 instead of 2^-4), or write 1/x^4, float(x)^-4, x^float(-4) or (x//1)^-4

julia> big(4)^x  # There is no y... Error message could be improved and made consistent with the one above
ERROR: DomainError with -4:
`y` cannot be negative.

Now the downside, not really, for 2^17 you would need to use 2<<16, which is very common (or your @floatpows macro idea in reverse as @intpow), I think more common for people who do bit-twiddling, where this is useful. CompSci people who would know such stuff. Science people may be more ignorant of that and actually rather like floats used, for e.g. consistency with R.

If we really wanted, we could special case x^p were p AND x are literals to return integers still. It would be a little ugly, but not really leading to type instability. I’m not sure the negative p hack works in reverse for that, it may rely on x not literal? Potentially the parser could do this however. Another option would be to continue having x^Unsigned(p) return integers…

Yes, that’s an excellent point. You need stick with ring operations for this guarantee to hold (including << and >>> but not >>). Of course, even with ring operations floats offer no such guarantee, since floats only approximate a ring—they actually fail at very basic things like associativity.

1 Like

A digression. If that range is too small for your integer ring computations you can extend it without going past Int64 if you can perform your computations multiple times but change your modulo between the computations. When you have the results modulo (2^64, m_1, m_2, …, m_n) you can extend the range to their least common multiple using the Chinese Remainder Theorem. The modulos are preferably chosen relatively prime and as large as possible below 2^64. Obviously you don’t have the full machine speed for other modulos than 2^64 but if you are memory constrained this can be a huge win over bignums.

4 Likes

:open_mouth:, really good example to teach and warn about

One issue is that people want to be able to write x^2 and have it means x*x, which it currently does. Not insurmountable but a consideration.

2 Likes