A plea for int overflow checking as the default

Hello JuliaLang Community!

I am a social scientist making his way from the Python ecosystem (mostly numpy & pandas) to Julia, and have in general been delighted by the language – concise, generally quite readable, and performance without the need to mix in numba and Cython code all the time. But I have discovered one aspect of Julia that makes me exceedingly uncomfortable, both as a researcher and an instructor: the lack of int overflow checking.

As a researcher, my view is that a good language should (a) always give me the right answer, and (b) do so quickly. But crucially, speed should never come at the cost of accuracy, and I think the lack of int overflow checking violates that principle, and introduces two very distinct problems.

Packages

The main problem with a lack of int overflow checking, in my view, is that it means I don’t know that I can trust third-party packages. Int overflows occur because of an interaction between the data being used and the operations being executed. In my own code, I can control both, and thus manage the problem. But the idea of packages is that I don’t know all the details of how everything’s being implemented, and at the same time, the package writer can’t know all the details of the data I’m working with. As such, I don’t know how I can possibly be sure that int overflows aren’t happening somewhere in packages I use. And if I can’t trust that, I don’t know how I can trust Julia.

In software development, this isn’t necessarily a problem – if the program works, the program works. But in scientific computing, it’s entirely possible for code to run and generate answers that are just wrong in non-obvious ways. After all, if we knew what the result of our calculations should be in advance, we wouldn’t be doing them.

New Users

My second concern with the lack of int overflows stems from my experience as an instructor. The beauty of Julia is that it offers C-like performance to users who only need to understand Python-like syntax, basic type concepts & type stability, and a few basic principles (put everything in a function).

Right now, I think this community is full of C programmers and early adopters who are accustomed to thinking about things like integer overflows, but as Julia grows, the language will have more and more users who don’t even know what integer overflow means (indeed, this has been brought up elsewhere). Having that giant pitfall – one that may silently result in Julia returning answers that are just wrong with no warnings – feels like a huge disservice to the community the language could potential support.

Relevance

A last thought: I imagine that one reason int overflows aren’t checked by default is that 2^63 is pretty big, so this is not super relevant. In general computing, I think this is true, but I think scientific computer is precisely the contexts in which people work with weird numbers (both mathematicians doing simulations and people working with exceptionally large data). Moreover, I know I often downcast my data into Int32 or Int16 to conserve memory, which is often the constraint with the datasets I work with (and I imagine this may be true for many Julia users).

Indeed, looks like this comes up not just for downcast data, but for people working on 32bit machines and in situations where Julia deliberately uses Int32

Performance Hit

I will not pretend to have the expertise to speak to the size of the performance hit, and I’m sure it’s non-trivial, though from what I’ve read maybe not that bad?. But at the same time, for a language written for research, it seems like being right should always be the first criterion, and only after that criterion has been met should be be optimizing speed.

Moreover, I’m all for offering un-checked integer operations as an option. But by making it a non-default, hopefully when people invoke it they will do so deliberately and with some serious thought as to what problems the decision may cause.

4 Likes

The discussion of performance in the linked page is almost irrelevant. The branch shouldn’t be too expensive (though having a lot of them can add up) but the biggest cost is on disabling many other optimizations.

FWIW, we are very different from C since we have defined semantics for all integer overflow.

3 Likes

I understand your feeling (as a social scientist myself), but that ship has sailed long ago. As you say, on 64 bit systems Int is large enough for most purposes. Smaller integer types promote to Int when doing operations on them, and conversion between types checks for overflow, so you should generally be safe. Also, floating point types are often a better choice for numeric data, as they provide some protection against overflow (though loss of precision can happen silently).

3 Likes

And FWIW, we do have checked versions. And they are documented https://docs.julialang.org/en/latest/search.html?q=overflow

This is arguable. What we are basically doing here is exposing hardware behavior and limitation instead of pretending they don’t exist. You can make the same argument with floating point and then we need to have a single BigFloat type for all floating point numbers without Float16, Float32 or Float64 etc. The loss of precision is even more common there.

Also, I don’t see how this is any different from any other bugs. In some sense, I wouldn’t trust any code, anywhere, to be bug free. The package obvously don’t control their input but they should be able to handle all of them. If integer overflow is something the package should care about, then it should handle those using the tools we provide. If this is not true, then it’s a bug that needs to be fixed in the package or it should be documented that the user should check them before passing the data to the package, no different from any other bugs / documentations.

For new users, it also shouldn’t be different from many other aspect of the language that’s different from some otheer languages that we don’t give a warning on. Better tooling should be the right way to help with this, e.g. by providing options to force checked arithmetics similar to how we have options to disable bounds check elision.

9 Likes

@yuyichao And FWIW, we do have checked versions. And they are documented

Thank you for those links. I have also found this open issue which seems like it would at least offer the option to convert all integer operations into operations that check overflow, albeit not as a default. If that gets integrated (which I’d love to see), honestly I’d probably leave that on in most cases, or at least use it for testing code to make sure it’s not susceptible to overflow problems.

@yuyichao You can make the same argument with floating point and then we need to have a single BigFloat type for all floating point numbers without Float16, Float32 or Float64 etc. The loss of precision is even more common there.

That’s true, but it’s always there (doesn’t just pop up once in a blue moon to corrupt your results), and while BigFloat would decrease its magnitude, the fact floats are inherently imprecise is unavoidable. By contrast, integer overflow is categorically solvable.

@yuyichao This is arguable. What we are basically doing here is exposing hardware behavior and limitation instead of pretending they don’t exist.

For new users, it also shouldn’t be different from many other aspect of the language that’s different from some otheer languages that we don’t give a warning on.

I think the big distinction is that this requires a special kind of knowledge about how computers work that most scientists don’t have. I understand that in a developer community this may seem strange, but integer overflow is a really weird concept to most non-programmers, and I think requiring users to understand it ups the bar on the required knowledge for working with Julia. Or, more accurately, I fear it’s knowledge many users won’t have and won’t know they need to have because it’s so subtle and almost never exposed. And integer overflow defies our normal logic of how math works.

Also, I don’t see how this is any different from any other bugs. In some sense, I wouldn’t trust any code, anywhere, to be bug free.

I think my main concern is that it can be a silent bug that corrupt results without getting attention (unlike, say, a syntax error). Also, it’s one that can be universally fixed and prevented with a change (like adding integer overflow checking) – if there were one of those for mis-specification of a formula, I’d ask for that too!

Unit checks and continuous integration testing are required for registered packages. You should always check the tests to trust the package. This is true whether or not checked arithmetic is used.

Also, by default the tests should run on AppVeyor, which does both 32-bit Windows and 64-bit Windows. If you have overflow, it will definitely be found if you have proper unit and integration tests.

2 Likes

Some time back, I had suggested that we might be able to get away with a single overflow check per function. I.e. we’d throw an error if overflow occurred at some point during the function, but not necessarily right when it occurs. I still feel like a weaker form of overflow checking like that might be practical without destroying performance completely. But yeah, it’s certainly a non-trivial challenge to implement correctly.

2 Likes

@Keno I think that’d be amazing, and a great compromise! I’d be super happy with that.

1 Like

@Keno Wouldn’t overflow checks disable SIMD anyway, even if the error is thrown only after the fact?

1 Like

Yes, at least X86 doesn’t set the flags bit for packed integer arithmetic. However, most of the performance loss from doing the checking is because you can no longer vectorize any loops because of overflow checking on the index range (which in the check once scheme can be done well, once). I’m not saying it could be implemented at zero cost (it can’t), but it might be implementable low-cost enough that doing it by default + @inbounds-style annotations might be acceptable. Again, the details are complicated, and I don’t have data to suggest it is possible, but it’s at least plausibly that such a weaker form could be implemented. Also, if we become popular enough we can convince Intel to put some overflow checked SIMD integer arithmetic into their ISA ;).

4 Likes

@ChrisRackauckas Unit checks and continuous integration testing are required for registered packages. You should always check the tests to trust the package.

I appreciate that, and am grateful Julia has built in those safeguards. But tests are an imperfect instrument, and (often for ease, speed, or because the auto-testing machines have limited memory) tests tend to be run on relatively simple and small examples of test data. Overflows are a problem that’s especially likely in “big” runs – either huge iterations or huge datasets, neither of which are generally covered in tests. So I think this is an area where they tend to be especially limited protection.

1 Like

You are also welcome to use the BigInt type to store your integers (e.g. x=BigInt(5)), and no overflow will ever occur. Almost all functions that work on Ints, both in base Julia and in packages, will also work on BigInt.
The speed of operations will be enormously (orders of magnitude) slower, though.

3 Likes

With 64-bit integers, you are never going to overflow if the integers are counts of iterations or bytes or anything “real”, no matter how “big” the run is. On the spectrum of bugs to worry about, 64-bit integer overflow is pretty far down on the list of priorities in my opinion, and not worth sacrificing an order of magnitude in performance for.

You will easily overflow 64 bits if you are doing number-theory or combinatorics calculations, but that is a different kettle of fish — in such circumstances it is usually obvious from the beginning that you need to either use bignums or take some other special precautions.

10 Likes

Do you have a specific example where you have had a problem with 64 bit integer overflow? As @stevengj points out, it’s a pretty unlikely scenario.

1 Like

@dpsanders It’s a fair question. I have not personally run into an Int64 overflow issue. I think I’m more concerned with Int32 issues – I’ve had plenty of issues with Int32 overflows, and I think many people working with very large datasets downcast to conserve memory.

With that said, my work is only in the lower-middle of the “big data” spectrum, so I guess I was imagining others may hit. For example, I regularly work with arrays with a half billion observations with relatively large integer values (on the order of 2^32). Integers don’t have to be that big to overflow if you sum a half billion of them.

I think we all agree that it would be beneficial to have the option to turn on checked arithmetic globally. If that flag should extend also into base Julia code one might have to make it a build instead of a runtime option so that the flag propagates into the system image and functions like sum will also use checked arithmetic. In the end it depends on somebody investing the time and effort to make that possible.

With regards to performance impact:

using BenchmarkTools

checked_sum(arr) = mapreduce(identity, Base.Checked.checked_add, arr)
arr = abs.(rand(Int32, 1_000_000_000))

@btime sum($arr)
  242.659 ms (0 allocations: 0 bytes)
1073724548707593956

@btime checked_sum($arr)
  1.387 s (0 allocations: 0 bytes)
1073724548707593956

I just noticed that the result of the addition 1073724548707593956 >>> typemax(Int32) and that is due to addition between two Int32 being promoted to Int64.

1 Like

Wouldn’t it be feasible to have a Integer type that checks for you, something like :

type SafeInt{T <: Integer} <: Integer
    val::T
end

import Base: +

+(x::SafeInt,y::SafeInt) = SafeInt(Base.checked_add(x.val, y.val))

f(x,y) = x + y
f(2^62,2^62) # -9223372036854775808

f(SafeInt(2^62), SafeInt(2^62)) # OverflowError()

So if you are suspicious/paranoid about overflow you could just convert your data into SafeInt.

5 Likes

A “safe” type will solve the problem as long as we only worry about overflow that happens due to the data itself. There still may be issues, for example, if somebody assumes that the number of elements in a vector will fit into an UInt16, and you feed his function a vector of 65537 SafeInt elements…

Something that I’d really like to have, are user-definable “hooks” that you can attach to any number of different events. Integer overflow, floating point exceptions, memory allocation, object destruction, module loading, garbage collection, etc, etc. I believe that would solve the problem discussed in this thread, as well as many other problems.

10x?! Jeez, that hurts – I was expecting something closer to 2x. That’s certainly a strong case for not always checking as a default…

@Keno would the single check per function call you suggested have similar performance hits?

1 Like

While it might be possible to add these kinds of hooks (for all the Julia specific it certainly is), there are others like floating point exception and integer overflow, where the hardware does not provide the information. Even in the case of integer overflow, where the information is exposed, you lose that feature once you are doing SIMD computations with SSE or AVX (see the PADD instruction). In general any kind of instrumentation will always have some kind of overhead.

I think that is exactly the right idea, allowing you to choose for your data to activate overflow checking, and a package like that would be welcomed.