Discussion about integer overflow

There is this package:

https://docs.juliahub.com/SafeREPL/vNvbl/0.1.0/

Yet, perhaps in any language it is too much to ask that computer-precision arithmetic will behave as exact arithmetic in any case. Probably one can always build some corner case example where the result is not what one would expect. Dealing explicitly with such limitations of the representation of the number in computers is probably important if the application is critical.

3 Likes

Int this case we are intentionally telling the language to behave in such a way, it is no accident, and even in these cases Matlab/Octave return their maximum values which would be the equivalent of Inf.

octave:1> x = uint8(128)
x = 128
octave:2> x^100000
ans = 255

Your example does not show what you think it does. R actually treats those numbers as floats from the very beginning, so it is doing floating point arithmetic:

> typeof(10)
[1] "double"

> typeof(10^2)
[1] "double"

> typeof(10^63)
[1] "double"

> typeof(1/(1-10^49/10^63) + 1/(1-10^49/10^63))
[1] "double"

> print(1/(1-10^49/10^63) + 1/(1-10^49/10^63), digits=22)
[1] 2.000000000000019984014
6 Likes

Well, let’s look at it in a different way, what mathematically correct calculation would you type in R that would return something wrong?

If using double under the hood is the trick to have correct results then perhaps this is what an hypothetical --save flag in Julia could do.

What this shows is the following:

  • @GunnarFarneback’s example shows that integers in Octave do overflow (by wrapping around)—operations with native integer types do this just as they do in Julia.
  • Your exponentiation example doesn’t contradict that, rather it shows that Octave implements x^y as floating-point exponentiation, even if the arguments are integers.
  • There is no way to do exact integer exponentiation in Octave without having floating-point round-off.
  • Julia, on the other hand, supports actual integer ^ which has no roundoff but does wrap around for native integer types.

What you’ve found here is not that Julia is less accurate than Octave (or Matlab or R), but that Octave forces floating-point operations, whereas Julia gives you these choices:

  1. Native integer operations, which never lose precision and are very fast but can wrap around
  2. Big integer operations, which never lose precision and never overflow but are very slow
  3. Floating-point operations, which lose precision and are fast and don’t wrap around

Octave, Matlab and R only give you the last option. Note that whether this is more correct depends on whether you care about the high bits or low bits of the computation. You do have to understand these things and use the appropriate types for what you’re doing and understand their failure modes. But that’s true of floating-point as well—if someone is using floats in any of these languages naively, there are many examples of how they can get completely wrong answers. If you’re writing code where accuracy is important, then you need to use the right tools, understand those tools and what they guarantee and what they don’t, and most of all, you need to test, test, test.

29 Likes

No, I left it out deliberately. I don’t find it particularly interesting what type you get from literals, you could as easily have received it as a function argument, read it from a file, or obtained it from a large array which someone behind your back had optimized the storage of.

My point is that there are traps out there if you are unwary and that it isn’t something specific to Julia. In fact I find the satiating integer arithmetic of Matlab and Octave rather more insidious than the wraparound of Julia. It’s about as likely to give an incorrect result but a higher risk that you don’t notice.

Not only integers are tricky. Floating point numbers of course have their own quirks, such as this classic:

octave:1> mean(single(1.2 * ones(10000000, 1)))
ans =  1.0797

This is not at all theoretic; you can easily run into this when working with images. And yes, this can happen also in Julia even if the mean function itself does a better job. You simply have to know and understand your tools.

11 Likes

Which answer is correct depends on what you’re trying to do. For example:

julia> 3^39
4052555153018976267
octave:3> 3^39
ans = 4.0526e+18

Which one is more correct? The Julia answer is the exact correct answer. Octave doesn’t support exact integer exponentiation. If you don’t care about that, then sure, that’s fine.

Personally, when I’m using native integers, my mental model is that they do arithmetic modulo 2^N where N is the number of bits of the type. I know that many people don’t have that mental model for integer arithmetic, but many people are also surprised when they do 0.1 + 0.2 and don’t get 0.3 back.

14 Likes

Here you go, integer overflow in R:

> typeof(2147483647L)
[1] "integer"

> 2147483647L + 1L
[1] NA
Warning message:
In 2147483647L + 1L : NAs produced by integer overflow
4 Likes

Use floating point numbers for computation, and integers for things like indexing or when you specifically want modular arithmetic, like random number generators.
Float64 represents all integers exactly with absolute value <= 2e53, so it is just as exact for most values you’d be interested in using, with better behavior in the extremes.

10 Likes

To R’s credit, from the “lack of surprise” perspective, R does warn you about that happening and return NA as a poison value. Of course, that makes integer arithmetic excruciatingly slow in R and it also means that if you want the native modulo 2^N integer arithmetic behavior, there’s no way to get it.

6 Likes

This is not quite right. Octave (like Matlab) have floating point literals (even when they look like integers) and satiating integer arithmetics.

octave:1> uint64(10)^1000
ans = 18446744073709551615
2 Likes

I am fine with overflows! That’s something I can fix, wrong results sneaking into my calculations I can’t.

1 Like

Ah, thanks for the correction!

1 Like

My concern is this:

julia> 3^40
-6289078614652622815

octave:1> 3^40
ans = 1.2158e+19

But I don’t want to make it about Matlab/Octave really, I would like to focus on the compliance part because for research I find Julia to be the best language I know and I love it!

2 Likes

Why are you even using integers instead of floating point?

Are you worried about forgetting the .?

3 Likes

Just a reminder how others do it. .NET Framework:

unchecked
{
    int1 = 2147483647 + 10;
}
1 Like

I think this is a very key post because it highlights what I was trying to say: you need to understand your tools (numerical types) and use the right one. If you’re using Ints and you’re unhappy about them wrapping around when you take big exponents, then you’re using the wrong tool.

The presumption of this complaint seems to be that people should be able to use Julia without knowing how its numerical types work and get completely mathematically accurate results. There’s two issues with that. The first issue is that this simply isn’t the philosophy of the language at all. The philosophy of Julia is to give you all the options you need to solve problems but to assume that you know what you’re doing and will use them appropriately. The second issue is that it’s not actually possible to make a system like that. Mathematica tries and probably comes the closest, but there will always situations where you need to understand your tools in order to write numerical code that produces correct results and believing otherwise will eventually get you into hot water. Which is why Julia doesn’t try to do the impossible and instead tries to give the programmer all the tools they might need, assuming that they know what they’re doing.

27 Likes

But correctness of those results is in they eye of the beholder. What if the next thing you do with that value is ask whether it’s odd or not?

octave:9> rem(3^40, 2) == 1
ans = 0
julia> isodd(3^40)
true

The Octave answer is completely wrong whereas the Julia answer is correct (and would be for any integer arguments). Perhaps you work in an area where only the general magnitude of a result is important, in which case, yes, 1.2158e+19 is a better answer. And if that is the case for you then you should use floats and you’ll get as good answers as you would in languages that do everything with floats. But that’s not universally the case. There are people for whom the last bits of an integer computation are just as important as the first bits (they should use BigInts); there are also many situations where only the last bits matter (Ints are great for this).

17 Likes

And one more cherry-picked example:

julia> (3^40 + 1) - 3^40
1

octave:1> (3^40 + 1) - 3^40
ans = 0
6 Likes

That’s true, and at the end of the day Julia will have a community of users that will decide if the language fits the purpose for their work/research.

That is why I was trying to drive the conversation into the compliance part for business rather than if we can trick Octave into a wrong result, if we can, then Octave won’t be accepted either by certain regulatory bodies that will demand guarantees that correct formulas relevant for their business return correct results.

The concerns and information I shared is from someone that is close to the business side of things. The questions I raised are the one that will be raised by the business and not always in front of you when doing a presentation.

If these people has the slightest concern that one of their mathematically correct models can return wrong results they will not risk it, in fact, even if they want to risk it, the slightest comment to the lawyers in the compliance team will prompt and immediate answer that will stop them from doing so.

Now, Julia is still an awesome language for research, I think you guys have done an amazing job, but some professionals might not be able to enjoy it for the reasons described above and I thought my experience from the business was wroth sharing.

1 Like