A plea for int overflow checking as the default

Neither + nor Base.checked_add promote to Int64 when the input arguments are Int32, but the results above are still Int64 because functions like sum, reduce and mapreduce “widen” certain input types, and will thus call the 64-bit version of + and checked_add in this case.

(As a side note, it would be handy to be able to turn this undocumented “feature” off, e.g. when one wants to benefit from SIMD optimizations. As it is, reduce throws a MethodError when given an operator defined only on Int32 together with an array of Int32.)

1 Like

That behaviour does sound counter to what I would have expected. You can define Base.r_promote(::typeof(op), x::Int32) = x (or Base.r_promote_type(::typeof{op}, ::Type{Int32}) = Int32 in 0.6) to opt out of that behaviour, but nevertheless the fallback should maybe not make those assumptions (do you care enough to open an issue on Github?).

I opened an issue: https://github.com/JuliaLang/julia/issues/21523

1 Like

I’d just like to contribute my two-cents to this discussion: coming from high energy physics, where quantities frequently span many, many orders of magnitude, I nevertheless have found this to be a complete non-issue (for 64 bit integers). The main reason, I suppose, and as others have pointed out, is that typically data with such large magnitudes are handled as floats.

This is sort of an interesting information-theoretic/thermodynamic question to think about. The only reason to use ints instead of floats (I think) is precision and it seems to me there are some deep, fundamental reasons (i.e. related to the renormalization group) why you wouldn’t want the type of precision offered by ints in situations where you have values spanning so many orders of magnitude. I suppose there might be some special cases where you are dealing with accumulating lots of small values to make a bigger value (if you have some sort of “\infty\epsilon”) where this might be relevant, but I would think that in such a case the right thing to do would be to use BigFloat or something like it, and even in that case, it’s pretty hard for me to contrive a good example (I’ve written quite a lot of numerical integration code in my lifetime, and still, this has never been an issue).

(Come to think of it, if I ever did see any code where this was relevant in a physics context, it would serve to me as a red flag that whatever the code is doing involves a theory that is badly fine-tuned. Of course, sometimes fine-tuning is something that we just have to live with, probably because we don’t understand what’s really going on (ironically it’s a problem that I’m much more tolerant of than a lot of people), but on purely theoretical grounds, I would find any application of code that needs this type of overflow checking as a bit suspicious.)

I suppose it’s possible that I’m completely missing the point and there is some (completely different) application of this sort of thing that I’m just not thinking of.

That said, I’m a bit curious, is there some specific application where you expect this to be particularly relevant?

Edit: The one major exception to the above discussion I can think of is pure number theory (and by extension cryptography, which seems like a very relevant application). However, these seem like really special cases, and I don’t think it’d be unreasonable to expect number-theory/crypto people to use some sort of overflow-checking macros where appropriate.

1 Like

The reason is not complicated. Floating-point values offer uniform relative accuracy (i.e. a fixed number of significant digits) over the whole representable range. Integers and other fixed-point values do not.

In number-theory and cryptography, you wouldn’t use fixed-precision integers at all. Such applications use arbitrary precision integers (BigInt and similar).

1 Like

Yes. My point was that there might be some physics or information theory reasons why “uniform relative accuracy” is all you could ever really want.

One big one for social data is numerical identifiers, like phone numbers, case numbers, or etc.

I’m pretty sure those aren’t close to overflowing with 64-bit integers.

julia> typeof(19999999999)
Int64

julia> typeof(99999999999)
Int64

julia> typemax(Int64)
9223372036854775807
1 Like

You wouldn’t add two phone numbers, so how does this trigger overflow?

4 Likes

Much has already been said here, but I’d like to add two comments.

First, there are two situations that cause integer overflows: 1) a number gets bigger than the developer of the software expected, and 2) a bug in the code. Many have argued that 1) shouldn’t be an issue with Int64, but that leaves 2). Overflow checking can catch bugs that would otherwise go unnoticed if subsequent computations bring values back into the realm of the possible. For that reason, I’d always want to have an easy way to run some code with overflow checking enabled. Not necessarily by default, but without having to modify the code itself, i.e. via a compiler or runtime option.

Second, overflow of the first kind, i.e. data becoming bigger than expected at development time, is an extremely frequent source of bugs, in particular for code that is used over a long time. Perhaps the most spectacular overflow bug was the one that crashed an Ariane 5 about 20 years ago. A more recent example is this story about YouTube. In between there was the year-2000 bug, which has caused much anxiety and cost a lot of money.

All the arguments that overflow shouldn’t happen with Int64 are perfectly reasonable, but so were arguments about storing years as two-digit numbers in the not-too-distant past. If you write single-use research code, such arguments are OK, but for general library code, always consider that it may be around for much longer than you think. Again, an option for easily running with overflow checking provides some basic insurance.

And, of course, the mere fact that Int16 and Int still exist shows that there are good reasons to use them, so the overflow discussion cannot stop at declaring Int64 safe for general use.

6 Likes

it is not exactly important or anything, but as a “fun trivia”, i can’t resist replying: the ariane 5 accident was not an addition overflow bug, but a conversion, and it was caused exactly by overflow checking. should they do a blind truncation, everything would be perfectly fine. but they did check, an exception was raised, and there was nobody around to read the resulting error dump. as a sort of moral of the story: if you have no idea what to do in an error condition, you might as well not check it.

7 Likes

Not that it matters much for the subject at hand, but there are two reasons for using ints that are unrelated to precision:

  • speed
  • exact arithmetic

Exact arithmetic may look like precision, but it is more. Integer arithmetic has all the mathematical properties we learn in school, such as associativity of addition and multiplication, which float arithmetic lacks due to the implicit round-off operation. As a practical consequence, integer arithmetic can be parallelized without any change of the results, which is not the case for float arithmetic.

1 Like

What about a flag or environment variable such that the Julia parser will make any Base integer computation use the safe form, and maybe a macro so you could easily apply it to code blocks?

3 Likes

That should work, assuming that overflow checks are done whenever at least one argument is made “safe”. In that case, enabling overflow checks when compiling the top level of the code is sufficient to trigger overflow checks in library code that is being used.

1 Like

One often does mathematical manipulations of numeric identifiers. Identifiers are quite often concatenations of different codes – leading digits in phone numbers, for example, can signify the phone carrier or whether a phone line is a mobile phone or land line. I’ve personally run into int32 overflows while trying to add country prefixes to phone numbers.

Not in int64. I will concede the point that int64 overflows are unlikely using numerical identifiers, but it does happen with int32.

Int64 versus int32

I think there’s a tendency to ignore types like int32 or int16, but I’d argue that memory footprint is as much a performance issues as pure computation speeds. In a lot of big-data applications, the ability to store data in smaller types can mean the difference between doing all your calculations in core memory, or having to use out-of-core tools or (god forbid) distributed platforms. So while it is perhaps true int64 overflows are somewhat unlikely (though certainly not impossible – note below), I think we still need to wrestle with int32 and int16.

Floats versus Ints

I think my main concern here is that while floats may be the best choice for arrays of large numbers, naive researchers may not always realize that and use ints instead. And in those situations, I’m not comfortable with the idea that julia’s response is to “punish” the user for their incorrect type choice with wrong answers.

One of the things I find most appealing about julia as a potential language for science is that its forgiving. If you screw up something like type-stability, julia still works, it’s just slower. In effect, you basically get python (or something close to it). But this is categorically different – if a user makes what I’d say is a pretty reasonable mistake for a non-CS person and stores their big numbers in ints, instead of just running more slowly, you get the wrong answer. And that seems problematic.

1 Like

This is true in my experience as well. Take an IPv4 network where you have a 32-bit value that’s partitioned into an arbitrary number of network and host bits. My IPNets.jl package uses math to determine start and end points of these networks, and we run into issues in IPv6 when we hit the limits of UInt128.

2 Likes

What about a flag or environment variable such that the Julia parser will make any Base integer computation use the safe form, and maybe a macro so you could easily apply it to code blocks?

I think that’d be really nice.

1 Like

I’ve been away for a while (I think this is maybe my first post to discourse :sweat_smile: ), but one of the things I love about Julia is that if you want something, you can build a type for it and everything just works.

If you’d like Int64 arithmetic with overflow checking, you can write a package: CheckedInt64. The rest of your code should still work with little or no modification.

This is usually the way to go with any proposed changes to base Julia.

5 Likes

Not only that, but with macros and method extensions, you can (at least in principle) implement your new type in an existing package without really changing any code. This still kind of blows my mind (it was completely inconceivable in my Fortran, C++, Python days).

4 Likes