Discussion about integer overflow

But correctness of those results is in they eye of the beholder. What if the next thing you do with that value is ask whether it’s odd or not?

octave:9> rem(3^40, 2) == 1
ans = 0
julia> isodd(3^40)
true

The Octave answer is completely wrong whereas the Julia answer is correct (and would be for any integer arguments). Perhaps you work in an area where only the general magnitude of a result is important, in which case, yes, 1.2158e+19 is a better answer. And if that is the case for you then you should use floats and you’ll get as good answers as you would in languages that do everything with floats. But that’s not universally the case. There are people for whom the last bits of an integer computation are just as important as the first bits (they should use BigInts); there are also many situations where only the last bits matter (Ints are great for this).

17 Likes

And one more cherry-picked example:

julia> (3^40 + 1) - 3^40
1

octave:1> (3^40 + 1) - 3^40
ans = 0
6 Likes

That’s true, and at the end of the day Julia will have a community of users that will decide if the language fits the purpose for their work/research.

That is why I was trying to drive the conversation into the compliance part for business rather than if we can trick Octave into a wrong result, if we can, then Octave won’t be accepted either by certain regulatory bodies that will demand guarantees that correct formulas relevant for their business return correct results.

The concerns and information I shared is from someone that is close to the business side of things. The questions I raised are the one that will be raised by the business and not always in front of you when doing a presentation.

If these people has the slightest concern that one of their mathematically correct models can return wrong results they will not risk it, in fact, even if they want to risk it, the slightest comment to the lawyers in the compliance team will prompt and immediate answer that will stop them from doing so.

Now, Julia is still an awesome language for research, I think you guys have done an amazing job, but some professionals might not be able to enjoy it for the reasons described above and I thought my experience from the business was wroth sharing.

1 Like

It isn’t a “trick”. The Octave result is wrong—in a different way than the Julia integer result, granted, but still wrong. Moreover, it’s impossible to create a system where the answer is always correct, so if that’s the requirement for regulatory compliance, then there will be no compliant languages.

I don’t mean this to come off snarky, but you do know about all the issues that exist with numerical stability of floating-point computations, right? If you think that you can write down a mathematically correct model in Matlab or Octave or R and that it’s guaranteed to give an accurate result, then whoa boy, I’ve got some bad news for you…

13 Likes

just an FYI… Pumas based files are now accept as Julia scripts by the FDA… https://www.fda.gov/media/85816/download
check out the March 15th update

18 Likes

But you can make results as robust as technically possible…

I’ll be a bit dramatic now but if somebody dies because a Pharma model advises the wrong dosage due to an overflow error what answer do you want to have for the judge and to the families of the victims?

1 - We used the language which to our best knowledge had better checks.
2 - We used the fastest language knowing the slowest one offered better checks.

Answer number two will land you with a cell mate whose name is probably not Julia…

I think this became too speculative. Is there a real example of that kind of compliance to show?

NASA has landed on the moon using probably Fortran with 32bits arithmetic, I’m pretty sure they were quite careful about correct results.

16 Likes

Not meant to be offensive, but if a Pharma dosage is entirely dependent on a model and not common sense, then we are all in trouble. Models tend to objectify our common sense based on data. If a model prediction does not agree with data, what would you conclude?

4 Likes

64-bit integers can get quite large before they overflow - roughly 9*10^18.
Most of the time, integers are used for counting and indexing. For these purposes, overflow is not an issue, but performance is. Therefore, overflow safety checks would do more harm than good.

I doubt that they are many use cases where Int64 overflow is even a remote possibility. And if you have such an exotic use case, you should be aware of it and use an appropriate data type.

As the says goes “common sense is the least common of the senses” that’s why you need models and data combined, a potential error in a dosage does not need to be something spectacular like 3Kg instead 3mg but something like 30mg instead 3mg.

When it comes to compliance one in a million chance is one too many if there is an alternative.

Sincerely, I think that if you have an actual example of a compliance rule that requires such kind of language characteristic, this would be an interesting topic for discussion, possibly stimulating the development of a package that specifically addresses the requirements of those rules.

5 Likes

I guess I’m a bit puzzled about how this discussion is going. @viraltux you seem to really want to make constructive points and have been respectful throughout. Kudos. At the same time, the central message is this: (1) there are no existing systems that give you what you’re asking for, period; (2) a lot of previous thought by people with very deep understanding of the technical details has already happened; and (3) Julia, almost uniquely among languages, allows you to choose which set of tradeoffs matter to you.

Rather than advocating to change the language (you will never even make headway on that discussion), can I suggest that we morph this into a discussion of which packages fail to gracefully accept SaferInteger inputs and what technical changes need to be made to get them used throughout an entire stack of code? That puts the onus on you, but the point is you care a lot about this and quite a few people have already written the core tools you need in order to do this in the domain you care about.

41 Likes

https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow

1 Like

The question is where there is a risk of one in a million (or larger) for an integer overflow.

  • Loop variables: assuming one loop iteration in 10^-9 s, it would take a computer about 300 years to overflow.
  • Memory: assuming that the integer adresses single bytes, this would correspond to 9 * 10^18 bytes or 9 * 10^6 TB (even when wasting the adress room of negative integers). This should be out-of-reach for quite a while (except maybe for the largest supercomputer).
  • Currency amounts: for accounting, etc. integers may be used to represent cent amounts if rounding errors are not acceptable. 10^18 cents (or Yen, or whatever major currency) are quite a lot compared to the whole world economy. But here a hyperinflation (similar to Germany 1923 or Hungary 1946) may actually give you values where overflow is possible. This could be avoided by choosing an overflow-safe data type. But actually I am quite sure that the IT systems of all banks and financial infrastructure providers would collapse due to
    technical reasons if such a hyperinflation happens again in an industrialized country.

Edit: @viraltux may I ask to what industry you refer to?

1 Like

Thank you Tim.

(1) No, but some systems are better than others and more compliant than others.
(2) Unfortunately in business Technical people opinion might not matter that much… I am still using Windows.
(3) Yeah but my point is not so much technical as it is legal.

I am not advocating to change the language per ‘se’; the --safe flag option possibility that I suggested should leave the language as it is for those not using the flag.

I am just bringing attention to potential areas where Julia might not be accepted as a language to go.

Thank you very much for you input Tim, I appreciate it.

4 Likes

yes. Throwing an error is a massive slowdown.

There could be a julia --check-overflow option that is much slower but checks for overflows.

2 Likes

Thanks for re-emphasizing that. I think that --safe might be overselling what’s (easily) possible (I like --check-overflow better), but a close approximation seems like it might be feasible, if perhaps a lot of work. Key points:

  • Julia allows you to override include, and in fact does so at several points during its bootstrap (when it is building itself)
  • The newish mapexpr argument to include might allow one to automatically substitute SafeInt for Int pretty much throughout when loading & building package code; the authors of the packages would still write their code using Int but on systems you have built they would be translated into SafeInt.
  • someone might need to alter the parser to allow it to construct SafeInt from integer literals

I wouldn’t promise this would do the job, and it would be quite a lot of work, and you need to be aware that there will be major hits to Julia’s legendary performance. (It seems unlikely to erase all of Julia’s advantages, and Julia would likely be faster than any other “safe” system, but it’s not going to perform anything like “vanilla” julia.) But in principle, this seems like something that has a fair amount of promise to constructing the kind of system you can use.

I doubt anyone will do this work for you for free, though, and Since you’re from pharma, what about seeing if they can commission someone to do the work?

14 Likes

If you want to practice very defensive programming, you can use the methods from Base.Checked:

julia> Base.Checked |> names
15-element Vector{Symbol}:
 :Checked
 :add_with_overflow
 :checked_abs
 :checked_add
 :checked_cld
 :checked_div
 :checked_fld
 :checked_length
 :checked_mod
 :checked_mul
 :checked_neg
 :checked_rem
 :checked_sub
 :mul_with_overflow
 :sub_with_overflow

those methods will check for overflow on every operation. This comes with the obvious caveat of not having exp etc. though, so it’s mostly useful for implementing core algorithms providing e.g. checked_exp, not for direct use.

6 Likes

for functions like exp you are returning a floating point answer anyway, so you don’t need to worry about integer overflow.

3 Likes