Discussion about integer overflow

Yes, it is a limitation but I would not call it big since we could use BigInt if Int64 is not enough and an overflow is something we can catch.

I find more worrying this:

> julia> 1 / 10^20000
> Inf

I’d rather have an overflow here than Inf, or at least an Infinitesimal value… Maybe something we could ask for future versions of Julia.

2 Likes
> julia> 1 / 10^20000
> Inf

This example seems messy since it’s conflating integer overflow and FP overflow:

  1. 10^20_000 evaluates to exactly 0.
  2. So the expression is really 1 / 0, where Inf seems like the right answer.

Of course, one can say that you’d use BigInt everywhere, but there’s major tradeoff in performance then.

5 Likes

Well, then the problem is that 10^20_000 should evaluate to Inf, I think Julia language should have a review of these overflow scenarios.

Integer overflow has been thought about since at least 2012, is documented in the FAQ, has been discussed repeatedly, has been implemented in a package, and finally Julia is not at that stage of development anymore.

25 Likes

Very thorough answer! Thank you very much for the links @GunnarFarneback

I will however respectfully disagree with this design decision though, I would say people doing analytics would much rather have an overflow error thrown by Julia than having incorrect results in their calculations.

How do those people feel about for loops being slow? Why is that relevant, you ask? Because for loops are implemented using integer arithmetic. You slow down integer arithmetic and you slow down the entire language.

14 Likes

I would say those people might expect that in a high level language intended for numerical analysis correct results are above performance and they might expect not to have to worry about C++ like considerations when handling numbers. I would also say that, if they want performance, they would also like to have a package like FastIntegers.jl (just an example) knowing the kind of problems they might run into if they use it.

However, if I understood you correctly, it seems that Integer arithmetic is so intertwine with Julia that any safekeeping would cause a general slowdown. I have a question if I may, would just throwing an error also slow down the entire language?

I am asking because there might be strong regulatory implications; people lives might depend on the results of some analytics.

1 Like

yes. Throwing an error is a massive slowdown. To do so, you have to check overflow for every operation, and the possibility of throwing errors means that the compiler can’t optimize your code since things like vectorization rely on arithmetic not having side effects.

Also, note that Julia has a library SafeIntegers.jl which you can use if you want errors on overflow.

23 Likes

There is also https://github.com/milankl/Sherlogs.jl which is nice to check for some floating point errors, specially with float32/16

5 Likes

Thank you Oscar for confirming this point. Okay then, it is the way it is, and certainly some regulatory bodies might require SafeIntegers.jl to accept calculations, something good to keep in mind.

Thank you all for your help, I’ve learned something today!

3 Likes

At this point I’ve been involved in a lot of discussions with customers using Julia in a variety of heavily regulated industries—finance, pharma, medicine, insurance, aviation, aerospace, etc. While they have many concerns, I can’t say that anyone in any of those industries has ever expressed concern about integer overflow as a regulatory issue. If it was a big issue, it would rule out the use of many languages in those industries, including Fortran, C, C++, Java and C#. Moreover since every fast language has integer overflow, even if you use Python, most of your computation is still happening in a fast language that has integers that overflow. So while I agree that in an ideal world we wouldn’t have to make this trade off, we live in a world where checking every integer operation is prohibitively costly, and if we want Julia to be a language in which you can write code that’s as fast as possible (which we do), this is the call we’re forced to make.

41 Likes

Regarding Python: the standard int type does not overflow and is thus comparable to BigInt in Julia. But it is also very slow.
If you want to speed up Python using Numpy, Numba or Cython machine-type integers are used which do overflow without warning.

8 Likes

1/10^20000=1/0 should be Inf, it is about integer arithmetic, (10^20000=0)
perhaps you meant 1/10.0^200000=1/Inf=0.0

1 Like

Well, now you have found the first one, I work for Big Pharma and in some of my past projects closely so to the Pharmacokinetics and Pharmacodynamics crowd. These people need to have their models approved by regulatory bodies, how do you think they will react when they find out that Julia, by design, accepts 1 + 1 = 2.12 ?

julia> 1/(1-10^49/10^63) + 1/(1-10^49/10^63) # ~1 + ~1
2.123792857605807
R>     1/(1-10^49/10^63) + 1/(1-10^49/10^63) # ~1 + ~1
[1] 2

They might ask the following question and rightly so, wait a minute, you are telling me that my model can be mathematically correct but that I cannot expect mathematically correct results? If now we mention Julia speed they only thing they are going to hear is how fast Julia fails.

I am using R language as an example because R is widely utilized in Pharma when regulations matter and this is one language they use to have their models approved.

Regulated industries are not regulated in every step of their way, for instance, I have also been involved in Biomedical Imaging projects, in those projects we could use anything we want, any language, any hardware, anything, and that’s because when it comes to research we don’t have any technological regulatory constraints.

However, if you need your research to be approved that’s a different world; for instance, even though R is accepted by regulatory bodies in Pharma is not all R, all versions and all packages that are accepted.

There is not one single language intended for numerical analysis that I know of (SAS, R, S, Matlab, Mathematica, Maxima, Octave, SPSS,… a few others), not one, that allows for incorrect arithmetic of the kind I showed you above. Not one except Julia now.

Obviously, these languages might all be using C++ or Fortran under the hood, but they all still guarantee that a correct formula returns correct results.

Stefan, I believe that when we try hard often we can find ways to have the best of both worlds…

For instance, how about implementing a flag for a safe Julia (e.g. julia --safe)? This way people worried about regulations could develop their models in the standard fast julia, but have them approved by running the very same models in julia --safe mode.

This would make Julia not only the fastest language, but the safest too since, in safe mode, all kind of fancy safety measures could be put in place with no concerns for speed… Just an idea.

3 Likes

Regarding floating point arithmetic I’ve found LLVM Language Reference Manual — LLVM 16.0.0git documentation. Are these supported by Julia?

Regarding checked integer arithmetic clang/llvm seem to recommend to use sanitizers: UndefinedBehaviorSanitizer — Clang 16.0.0git documentation. There is no equivalent for Julia AFAIU?

Could you write it as exp10(49)/exp10(63) ?

I could, but how would you know if the developers of package you are using did?

You’re calculating numbers way outside the domain of integers, so why not go with floating point numbers in the first place?

julia> 1/(1-10^49.0/10^63.0) + 1/(1-10^49.0/10^63.0)
2.00000000000002

This also shows that R is lying to you - ~1 + ~1 is only ~2 after all and not exactly 2.

Interestingly, I believe the majority of languages you’ve quoted do actually have the same “problem” - they just mask it by showing fewer decimals than would be required to accurately represent the true number. For example, both Matlab and Mathematica lie to you in the same way. See here for Matlab and here for Mathematica:

By default, the inputs 0.1 and 0.2 in the example are taken to have MachinePrecision. At a common MachinePrecision of 15.9546 digits, 0.1 + 0.2 actually has a [FullForm][4] of 0.30000000000000004, but is printed as 0.3.

Arguably, your example is hitting a different failure mode (integer overflow) than just floating point imprecision, so I guess that’s a point to be made? For that though, SaferIntegers.jl has been suggested and would show you the problem right away.

14 Likes

Unfortunately your proposed safe mode would actually change the semantics of Julia - the overflow behaviour of Int64 and friends is part of the language.

This is all documented. Ints should only be used for “small” integers (such as for counting things). For large or continuous quantities there are plenty of alternatives: floats; big ints; checked integers.

In your example, replace 10 with 10.0 to get

julia> 1/(1-10.0^49/10.0^63) + 1/(1-10.0^49/10.0^63)
2.00000000000002

which is actually more correct than what R prints.

3 Likes

How well do you know those languages?

octave:2> x
x = 128
octave:3> x + x
ans = 255
15 Likes