Discussion about integer overflow

I haven’t participated in this thread, but I think that if you find that some people have been venting their frustrations, you should perhaps consider if there is something in the tone of your posts that have rubbed people the wrong way.

I know that several of your posts have nearly made me reply with a bit of annoyance, and that post was perhaps the last straw.

I don’t think this sort of smug condescension is appropriate.

10 Likes

I would like to introduce a diversion, or another potential benefit of a “check-overflow” mode. I am a mathematician using julia, and use integers or rationals most of the time and almost never floating point numbers. I am very well aware of the question of overflow, since about every computation I do must be checked for overflow to make sure the result is correct. I am very happy that Rationals are overflow checked, since just using Rational ensures safety (do you think Rationals would be must faster if not overflow checked? I don’t think so). I use very often SafeInt and in my experience using them slows things down usually 5% to 15% compared to Int (they could slow things down more if they prevent vectorization so probably my code does not vectorize too well). But it is quite difficult sometimes to make sure that code uses SafeInts throughout, specially code I have not written myself. So it would be very useful to me to have a mode which says "use SafeInts everywhere instead of Ints ".

By the way, I think this safemode should use SafeInts rather than BigInts since currently in julia using BigInts slows down the code by a factor of 50 (which is progress compared to julia 1.0 where the factor was 250) and in the best systems I know (where BigInts are internally Int when they represent small integers) the overhead is still a factor of 5.

8 Likes

The key reason why we can use overflow checks in rationals without causing problems is that rational arithmetic involves gcd calls, which slows stuff down enough that the overflow checks are relatively cheaper.

9 Likes

I didn’t see this specifically followed up on from above. If someone really wants something like this, is it possible to have a “checked context” such that the Julia compiler replaces the normal unchecked operations with checked operations while within that context?

C#, for example, does this: checked and unchecked statements - C# reference | Microsoft Docs.

Doing a full replacement via a command line switch does seem likely to break many things, but if someone built the feature out from a macro block, would that allow the feature to be built over time?

I will say it would be nice for testing purposes to be able to turn overflow checks on.

1 Like

A macro (as in ChangePrecision.jl) does something fundamentally more limited — it rewrites a block of code that you give it (or perhaps include), but cannot affect arbitrary functions called from that code (except by changing the argument types passed to those functions). Basically, you could rewrite literal integers to another type and hope that this is enough to propagate the safer type “downstream” to all called functions.

4 Likes

BigInt everywhere” could be an interesting research project: it would require a special type that can switch between machine integers of different sizes, and fairly aggressive range propagation analysis (probably beyond what LLVM can provide).

4 Likes

It’s fairly simple to just redefine + to be checked arithmetic… but you do need to rebuild Julia to avoid stack-overflows from checked_add itself using +:

Diff to Julia itself
shell> git diff
diff --git a/base/checked.jl b/base/checked.jl
index ad92a44e1e..6c925c2d83 100644
--- a/base/checked.jl
+++ b/base/checked.jl
@@ -135,7 +135,7 @@ add_with_overflow(x::Bool, y::Bool) = (x+y, false)

 if BrokenSignedInt != Union{}
 function add_with_overflow(x::T, y::T) where T<:BrokenSignedInt
-    r = x + y
+    r = add_int(x, y)
     # x and y have the same sign, and the result has a different sign
     f = (x<0) == (y<0) != (r<0)
     r, f
@@ -145,7 +145,7 @@ if BrokenUnsignedInt != Union{}
 function add_with_overflow(x::T, y::T) where T<:BrokenUnsignedInt
     # x + y > typemax(T)
     # Note: ~y == -y-1
-    x + y, x > ~y
+    add_int(x, y), x > ~y
 end
 end

diff --git a/base/tuple.jl b/base/tuple.jl
index b45e1efc91..5356bb693a 100644
--- a/base/tuple.jl
+++ b/base/tuple.jl
@@ -85,8 +85,8 @@ end

 # this allows partial evaluation of bounded sequences of next() calls on tuples,
 # while reducing to plain next() for arbitrary iterables.
-indexed_iterate(t::Tuple, i::Int, state=1) = (@inline; (getfield(t, i), i+1))
-indexed_iterate(a::Array, i::Int, state=1) = (@inline; (a[i], i+1))
+indexed_iterate(t::Tuple, i::Int, state=1) = (@inline; (getfield(t, i), add_int(i,1)))
+indexed_iterate(a::Array, i::Int, state=1) = (@inline; (a[i], add_int(i,1)))
 function indexed_iterate(I, i)
     x = iterate(I)
     x === nothing && throw(BoundsError(I, i))

At that point, you can just redefine +:

julia> Base.:+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} = Base.checked_add(x, y)

but things break pretty quickly:

julia> ┌ Error: Error in the keymap
│   exception =
│    OverflowError: 15057391676429522544 + 8207575013956623489 overflowed for type UInt64
│    Stacktrace:
│      [1] throw_overflowerr_binaryop(op::Symbol, x::UInt64, y::UInt64)
│        @ Base.Checked ./checked.jl:154
│      [2] checked_add
│        @ ./checked.jl:166 [inlined]
│      [3] +
│        @ ./REPL[1]:1 [inlined]
│      [4] hash
│        @ ./hashing.jl:107 [inlined]
3 Likes

A simple @withfloatliterals macro could rewrite all integer literals as floats, and give @viraltux his preferred behavior for at least the code he or his colleagues write. (note I wouldn’t recommend this in general, but the nice thing about a macro language is you can get what you want).

1 Like

This is the core reason this would be hard. It’s easy to implement overflow checking but there is a significant amount of code both in Julia itself and in packages everywhere that correctly and intentionally use the fact that native integer operations are defined to overflow. Those all have to be changed. In order to make a --check-overflow flag that can actually run anything non-trivial without failing (like booting into a REPL prompt), all those usages would have to be changed to use operators like the ones I proposed that explicitly do wrapped integer arithmetic. That’s certainly doable, but it’s a slog. If someone feels that it’s worth doing, I would certainly encourage them to tackle this project. Here’s what would have to be done:

  1. Add the --check-overflow flag. This is the easy part.
  2. Add the %+% and other wrapping arithmetic operators. This is harder because modifying the parser is fiddly and you have to be careful not to accidentally introduce breaking changes.
  3. Run Julia and fix places where + et al. are used that fail tests and replace them with %+% et al. as appropriate. Repeat until Julia boots. Then run the tests and repeat until all the tests pass. This is straightforward but tedious and time-consuming.
  4. Now do the same thing with every registered package and keep making pull requests to fix places where packages use + (et al.) with intentional wrapping by replacing them with %+% (et al.).
6 Likes

That seems like a little breaking change from my POV. /s

The job would be much less labor-intensive by using tooling to rewrite all existing usages of + to %+%. Then nothing would be broken and new code could use + or %+% as appropriate, and existing usages could be incrementally rewritten manually back to +.

Thank god I don’t have to do all that.

I really do think that @viraltux could do two things and get a lot of what he wants.

  1. create a @withfloatliterals macro that rewrites at least literals that are multiplied or exponentiated as floats… so 2^3 would become 2.0^3 and 2*thing would become 2.0*thing and I’d recommend to ignore ranges such as 1:end or 1:400
  2. create a function “includewfloats” that basically reads a file and applies the macro and then evals the result.

voila, now people get “R like” behavior for literals.

(another option would be to rewrite a^b as some function like floatpow(a,b) where

floatpow(a,b) = Float64(a)^b

and a*b as floattimes(a,b)

floattimes(a,b) = Float64(a)*Float64(b)
1 Like

The problem doing this is that, effectively nothing would be broken as it is now… but anything that is broken now would be broken there. If it is desired to really change the status quo then only the manual process is guaranteed to success (mostly).

2 Likes

It would be opt-in: unless someone runs Julia with the --check-overflow option, nothing would change. Therefore not breaking.

I can’t tell if this is serious. That would be fine if + was not a beloved notation that everyone knows whereas %+% is pretty ugly and long. While you’re right that what you proposed would technically work, it would be much more work to change every introduced instance of %+% back to + than it would be to change usages of + to %+% where errors are encountered. When changing a %+% back to a +, how do you know if it’s ok to change? The most likely way to do it is to change it and then try running tests to see if they break. At which point, you’re doing what I suggested but with extra steps.

3 Likes

What fraction of the 10,163 + usages in julia/ would you estimate are intended to permit overflow and would therefore be rewritten to %+% after testing?

One syntax option that’s a little less ugly and is already supported by the parser with correct precedences would be +ₘ, *ₘ, etc., in the tradition of writing modulo arithmetic with a subscripted number.

julia> +ₘ(x, y) = x+y
+ₘ (generic function with 1 method)

julia> 1 +ₘ 2
3
5 Likes

Maybe +₂? Given that is modulus 2^{128|64|32} :stuck_out_tongue:

I agree my rewrite-everything suggestion is much less useful if --check-overflow is added; it would be more useful if checked overflow were the default and --unchecked-overflow were added — rewriting to +ₘ would be nonbreaking and effective.

I think the syntactic difference between + and +ₘ (thanks @mbauman!) is manageable and could even be shortened to a one-character operator like (\dotplus) if desired.

The documentation for @inbounds suggests the compiler maintains a context that determines whether bounds checks are included, and this can be propagated. This requires an explicit @propagate_inbounds to allow, but it could also be made to propagate by default, yes? (EDIT: Or does that require the called code to be inlined to work?)

What’s stopping the creation of a similar context hint to apply to overflow checks? Is there something fundamentally different about bounds checking that couldn’t apply to overflow checking?

These are superficially “just” macros, but if you call macroexpand you’ll see that they actually expand into lower-level non-syntactic hooks like Expr(:inbounds, true) that are specially handled by the compiler. To propagate integer bounds checking, you would need to do something similar — a macro by itself is not enough.

(And even bounds checking only propagates into function calls that are specially annotated with @propagate_inbounds, which causes them to inline as you noted.)

4 Likes